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Preface 


Tests and measurements—their techniques and devices—are 
valuable aids to the kind of pupil evaluation that encourages 
optimum growth. However, there are teachers who either 
have blind faith in the efficacy of tests or are unreasonably 
skeptical concerning their use. Our aim is to help teachers use 
tests appropriately and constructively. 

This book is an outgrowth of the authors' experiences in 
conducting various courses in "tests and measurements." In 
these courses we have found it advisable to cover less mate- 
rial with more illustrative examples in order to achieve good 
communication. Even experienced teachers, whom both 
authors have met in extension classes, are frequently appre- 
hensive of the statistics and technicalities involved. Students 
preparing to be teachers also frequently doubt their ability to 
comprehend test usage. We have dared to presume that 
teachers and teacher candidates in other states are much like 
those encountered in Oregon: (1) They need to have the 
subject of testing presented with a minimum of statistics. (2) 
They need to know the limitations of tests. (3) They need to 
perceive the substantial aid that appropriately used tests can 
give. (4) They should have these “needs” met in an effective 
manner. Our aim, then, is to present the basic features of 
tests and testing in terms understandable to classroom 
teachers. 

Brevity has been one of our guideposts, for the sheer bulk 
of some books on measurement intimidates the teachers who 
enroll in our classes. The desire to make our treatment brief 
has sometimes caused us trouble in the writing of this book— 
we should have liked to go into more detail in explaining 
contributing and conditioning factors in various situations. In 
the first draft the chapters were longer and more cautiously 
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detailed, but one or the other author pared and whittled until 
both agreed that the minimum for effective communication 
had been reached. We hope that users of the book who 
might have written it differently will bear in mind the desir- 
ability of brevity and simplicity in the presentation of basic 
material. 

Another of our guideposts has been the intent to direct the 
material to classroom applications. The question, "Is this 
section (or paragraph) pertinent to the kinds of work 
teachers do?" was asked repeatedly while the chapters were 
being written and in each author's evaluation of the other's 
chapters. We feel that we have come close to fulfilling our 
criterion of classroom pertinence. 

Some readers will not be able to share wholeheartedly our 
criticism of grades and personality tests. The senior author 
has prevented the expression of skepticism regarding these 
techniques from being even more emphatic. Our hope is that 
the presentation will stimulate thinking—as did the prepara- 
tion of the material. Instructors who encourage their students 
to discuss the points of view presented will find that students" 
evaluations will elicit many of the concepts of evaluation and 
bring forth a recognition of the merits and shortcomings of 
testing devices. 

Our third guidepost was to keep the book in such form as 
to provide for flexible use. Brevity makes it possible for in- 
structors to develop their own points of emphasis. For those 
who plan further study, we have selected and annotated 
readings and provided study and discussion items. 

We wish to express our gratitude to the publishers of books 
and tests who have given us permission to use copyrighted 
materials. We also thank Mrs. Alta Diment, who typed and 
edited the manuscript from what was in many instances very 
rough copy. 

Denis Baron 
Harold W. Bernard 
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CHAPTER ONE 


Sampling Pupil Behavior 


In earlier times costly tunnels were driven into mountains 
to find out whether quartz outcroppings indicated the pres- 
ence of ore beneath the surface. Often the effort was fruitless 
— discouraging, time-consuming, and financially ruinous. 
Today it is possible to save time and money by using diamond 
drills, which, driven from various locations and at different 
angles, bring up cores, OF samples, of the mountain’s interior. 
The extent of the ore body can be determined fairly accurately 
from these cores. Promising cores are sent to assayers, who 
determine by tests the amounts of lead, silver, and copper 
contained in the samples. The results of these tests—the cores 
and the assayer’s chemical analysis of them—make it possible 
to estimate the value of the ore that will be obtained if the 
expensive tunnel or shaft is driven. 

There are tools available to teachers today which, like the 
diamond drill, save time, energy, and frustration in working 
with pupils. These tools are tests—samples of the behavior 
and traits of individual pupils that indicate quickly and with 
reasonable accuracy their status at a particular time and their 
potential. Tests provide samples of intelligence, knowledge, 
social orientation, and special aptitudes. They enable the 

1 


2 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


teacher to get a clearer view of the "inner workings" of his 
pupils. 

The miner takes a chance when he drives a tunnel; the pres- 
ence of ore does not ensure his success. Faulty earth may 
cause cave-ins, underground water may be expensive to com- 
bat, a slump in the stock market may create a poor sales 
field, and technological advances may reduce the demand for 
his ore. So, too, although the samples that the teacher obtains 
through tests may promise well, serious illness, a broken 
home, a shift in values (as during a war), or a quarrel with a 
cherished friend may threaten the realization of what is indi- 
cated by the tests. 

Tests, in short, are not panaceas; they do not solve all the 
problems of education and classroom management. They are 
useful devices in "sizing up" pupils in a better than trial- 
and-error fashion. We would prefer a physician who felt 
our pulse, took our temperature, and made a urinalysis and a 
blood count to one who merely glanced at us and prescribed 
his stock “pink pills.” Similarly, the pupil, if he had the back- 
ground for fuller understanding, would prefer an educational 
program fitted to his particular needs to an automatically pre- 
scribed program. The analysis the teacher makes on the basis 
of observation of the pupil certainly has its place; when it 
is supplemented by analytically interpreted test data, the 
child’s school life is more rewarding. We should, in short, 


obtain samples of the pupil's intelligence, interests, social ad- 
justment, knowledges, skills 
physical health if we are t 
possible. 


» potentialities, sensory acuity, and 
O work with him as efficiently as 


WHAT TESTS DO 


The tests made by the miner and the physician do not yield 
answers or make analyses. The Person who knows the tests 
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and what they do must make the diagnoses. Tests provide 
only data upon which to base a diagnosis. Educational and 
psychological tests do not make analyses or suggest what 
should be done; they give indications which may serve to 
sharpen and clarify the judgments teachers make on the 
basis of their experience, training, and understanding. 


Attitudes toward Tests 


As samples of behavior which yield indications, tests de- 
pend for their usefulness upon the attitude and knowledge 
of teachers. This entire book is directed toward the develop- 
ment of this essential background. Before we proceed, how- 
ever, let us examine some characteristic attitudes toward tests. 

1. Some persons have blind confidence in tests. They ap- 
pear to feel that all one has to do to solve an educational 
problem is to give a test, record results, and file the data. 
This attitude toward test results is, of course, absurd, because 
there are no simple answers in relation to complicated person- 
alities. Too many teachers give tests in order to discover that 
Johnny has an IO of 90, is up to age-grade standards in most 
of his schoolwork, and is *average" in terms of personal and 
social adjustment. The results are recorded in his cumulative 
folder, and classwork goes on in the same perfunctory man- 
ner as before. 

2. Tests are often regarded with an element of fear. Undue 
emphasis may have been placed on test results in the teacher's 
school experiences, and he may fear that tests will produce 
similar anxiety in pupils. This attitude is unfortunate. It is 
not the tests which should be feared but the misuse of test 
data. If one is to be failed because of his test score, he has 
reason to be apprehensive. However, if tests are used to pro- 
mote understanding and diagnosis, they will be welcomed. 
Actually, most of us like to take tests—if the results are not 
to be used against us. Many people enjoy the tests in Time, 
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Look, and the Reader's Digest because nothing but self-eval- 
uation is based on the results. It is entirely possible that chil- 
dren could learn to enjoy educational tests if the element of 
threat were removed and if the results were useful to pupils 
and teachers in their efforts to achieve better understanding. 

3. Tests are regarded by some with tentative confidence. 
This attitude approaches the sound and realistic view ex- 
pressed by the person who says, “I'll take the tests for what 
they are worth and permit myself to be guided by the results." 
If, however, the hesitating confidence is expressed as “T’ll use 
the results as long as they agree with views I already hold," 
the test results can serve little constructive purpose. 

4. An attitude that seems sound is to accept test data as 
rather accurate supplementary evidence. The teacher who ac- 
cepts this view uses tests to get a more complete picture of 
the individual. He realizes that tests are not entirely accurate, 
but he uses them with due regard for present limitations as he 
and others work to develop better ones. 

This fourth view would be more readily adopted if teachers 
generally had a better understanding of what tests are and do. 
It will, for instance, be helpful to realize that tests are not 
direct and absolute measures. They are means to evaluation. 
One can learn how tall a youngster is by using a yardstick. 
His weight can be determined by using a scale. His pulse can 
be counted. His blood pressure can be measured by a sphyg- 
momanometer. Intelligence, personality, knowledge, and in- 
terest, however, cannot be measured directly but must be 
inferred from indirect measures. Specifically, intelligence may 
be inferred from the subject's answers to a limited (and se- 
lected) number of questions, albeit the questions are designed 
to sample various areas of his total knowledge. Personality 
adjustment is evaluated by means of responses to questions 
designed to sample areas of social, academic, and personal ad- 
justment. Similarly, knowledge and interest are inferred from 
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representative questions which survey only part of the total 
area of knowledge and interest. 


Measurement and Evaluation 


Words often make understanding possible, but they some- 
times cloud the issue. Unless the specific connotation of a 
word has been learned, prior interpretations may get in the 
way of understanding. Thus, measurement in education has 
a slightly but significantly different meaning from the same 
word applied to carpentry, a purchase of sugar, or a bank 
account. These types of measurement can be accurate—re- 
peated measurements would yield identical results—but meas- 
urement in education cannot be repeated with identical out- 
comes. It might clarify our thinking if the word evaluation 
were substituted for measurement. But since the word meas- 
urement appears in educational literature and in discussions, 
it is advisable to indicate its specific connotation. Measure- 
ment by testing may be considered as a means by which eval- 
uation is achieved. Evaluations are often made without the 
basic data supplied by measurement, but sound evaluation is 
based upon the results of measurement. 

Considerable measurement is involved in purchasing a 
home, The amount of floor space; the cost of brickwork, lum- 
ber, and wiring; and the size of the lot are among the measure- 
ments to be considered. The judgments based on these meas- 
urements are an evaluation. Further, some intangible items 
would enter the picture: the style of the house, convenience 
of room arrangement in terms of family needs, and com- 
munity environs would be taken into account although they 
are beyond the limits of precise measurement. 

Similarly, some significant educational factors are beyond 
the limits of measurement by tests. For example, tests do not 
measure drive or motivation to use the knowledge or intelli- 
gence indicated by tests, nor do they measure the view that 
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the pupil takes of learning or the appeal or effectiveness of 
teaching. Such data as anecdotal records, the teacher's evalu- 
ation of his pupils, health data, and recorded observations of 
play and social behavior should be used to supplement and 
validate test data. Both teachers and pupils must realize that, 
as in buying a home, measurement is at best a basis for eval- 
uation. By means of it one can arrive at a more accurate 
evaluation than could be achieved by trial and error or by 
personal opinion. 

By providing the measurements upon which we can base 
evaluations, educational tests can do much to help improve 
the effectiveness of instruction and guidance. (1) Tests can 
help to estimate the present potential of the pupil to learn. 
(2) They can give fairly accurate information regarding the 
pupil's academic knowledge. (3) They can show about how 
much a pupil has grown in a given period of time and thus 
help to evaluate the efficiency of methods of teaching. (4) 
Tests can help to locate specific areas of difficulty (though 
they do not tell what should be done about the difficulty). 
(5) Properly used, tests can be a factor in the motivation 
of pupils. (6) Test results can give guidance in the more 
equitable grouping of pupils for the purpose of economy in 
instruction. (7) Tests can provide clues to intelligent guid- 
ance of pupils in their academic choices and their personal 
adjustment. (8) Tests can provide supplementary data lead- 
ing to a more objective evaluation of pupil status and prog- 
ress. 

It should be noted, however, that tests d 
accomplish any of these thin 
clues and corroborative data. 


o not completely 
gs. They only help by providing 


TYPES OF TESTS AND MEASUREMENTS 


Tests differ widely with regard to their nature and the pur- 
poses they are designed to serve. According to the way they 
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are designed and used, tests may be classed as verbal or non- 
verbal, performance or pencil-and-paper, and group or indi- 
vidual. A verbal test is one in which language plays a major 


part. The ability of the pupil to speak, read, and write de- 
termines in a major degree his effectiveness on this kind of 
test. His ability to repeat statements and his ability to follow 
written or spoken directions are sampled by verbal tests. Non- 
verbal tests indicate the pupil's ability to see the similarity or 
dissimilarity between pictorial materials or geometric figures, 
follow mazes, or put parts of a puzzle together. Speed of 
manipulation, accuracy of movement, and sharpness of per- 
ception are sampled by this type of test, and the use of lan- 
guage is minimized but not eliminated. In a performance test, 
the subject may be asked to maneuver blocks into a pictured 
design, place the parts of a picture-board puzzle together, or 
repeat a series of digits given to him orally. In a pencil-and- 
paper test, the subject records his own answers. He checks the 
answers he selects, draws his Way through a maze, or com- 
putes the answer to an arithmetic problem. Quite often, al- 
though not always, performance tests are nonverbal and pen- 
cil-and-paper tests are largely verbal. A group test is simply 
a test which a number of pupils take simultaneously. An in- 
dividual test is one which requires one examiner for each ex- 
aminee. 

Tests may also be classified as to their purposes. Some 
are designed to sample aptitudes, others achievement, and still 
others specific difficulties. The most common aptitude test is 
the intelligence test, which is designed to indicate the pupil's 
capacity to learn. Musical-ability tests and reading-readiness 
tests are other commonly used aptitude tests. Achievement 
tests indicate the pupil's level of performance in specific aca- 
demic areas such as reading. spelling. arithmetic, language 
usage, and comprehension of vocabulary. Tests designed to 
indicate specific areas of difficulty are called diagnostic tests. 
A diagnostic test in arithmetic may serve to indicate whether 
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the pupil's specific difficulty is in multiplying, adding, or di- 
viding or whether there is some particular number combina- 


tion that he has learned incorrectly and uses consistently, 
e.g, 6x9 is 52. 


Scales and Inventories 


Many of the instruments that are helpful for more objective 
pupil evaluation are called by other names than "tests." The 
quality of a sixth grader's handwriting is difficult to evaluate, 
but a handwriting scale on which there are graded examples 
of writing may serve to objectify the judgment. Spelling 
Scales group together words that have a similar degree of 
difficulty and familiarity and arrange them in order of increas- 
ing difficulty. There are rating scales which are used to re- 
cord interpersonal evaluations. (See Chapter 11.) For exam- 
ple, a three-point or five-point scale may list under a heading 
such as dependability the following degrees of possession of 
the quality: “(1) Always dependable (2) Usually depend- 
able (3) Unpredictable (4) Often undependable (5) Quite 
undependable.” Clearly such a scale depends largely upon the 
rater’s interpretation of the terminology used. Interests are 
sometimes studied through a preference scale. For example, 
interest in the study of biology may be scored, 
ested, somewhat interested, indifferent, 
strongly dislike,” 
limitations, 


“Highly inter- 
uninterested, or 
Used with due regard for their subjective 


scales provide supplementary data of value in 
pupil evaluation and guidance. 


As a result of the difficulty involved in * 
sonality the term inventory 
evaluating social and person 
made up of questions, 
"wrong" answers, 
would act in the si 


measuring" per- 
has come to be widely used in 
al adjustment. The inventory is 
to which there are no "right" or 
concerning how the individual feels he 
tuation described. No one question is re- 
garded as crucial, but the trend of all the answers is taken as 
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a clue to the person’s total adjustment in home life, school 
life, interests, and views regarding what is correct and incor- 
rect in daily behavior. 

The questionnaire technique is often used to study atti- 
tudes and interests. The subject is asked to indicate whether 
er not he agrees with a number of statements regarding cer- 
tain situations—for example, the activities and requirements 
of school. Again, no one question is regarded as giving an in- 
contestable clue to the respondent's orientation, but the total 
2m is regarded as indicative of trends in values and atti- 
udes. 


Projective Techniques 
are coming to be regarded as val- 


uable clues to personality. The projective technique permits 


the subject to "add structure to an unstructured situation"; 


that is, the individual injects his own meaning—theoretically, 


his own personality—into the situation. Specifically, a child 
may be given a set of dolls and doll furniture and told to do 
anything he wants to with them. What he does and how he 
treats the playthings is considered to be a reflection of his own 
personality. A picture with ambiguous content may be shown 
to the subject. What he describes as the content of the picture 
is, in part at least, a product of his own imagination. He may 
be told part of a story and asked to finish it; the ending he 
provides is considered a reflection of his inner feelings, past 
experiences, and wishes. 

Obviously, it is hazardous to interpret these *projections" 
of the pupil's personality too precisely. But if used cautiously 
and interpreted in the light of more objective supplementary 
data, including interviews and observations, projective tech- 
niques may provide valuable clues in understanding children. 
Projective techniques illustrate the fact that a distinction must 
be made between measurement and evaluation. Results from 


Projective techniques 
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these devices must be interpreted or evaluated; hence it is 
necessary that the teacher know something of the nature and 
dynamics of personality. Projective techniques include such 
activities as interpreting pictures, completing partially told 
stories or incomplete sentences, playing with toys, describing 
what one sees in ink blots or pictured cloud formations, 


painting, drawing, modeling clay, and writing stories and 
poems. 


STANDARDIZATION 


It is important that teachers understand the significance 
of the standardization of tests. Most of the published tests 
with which teachers work are carefully standardized; that is, 
norms or averages for each test have been established. For 
example, an achievement test in arithmetic is given on an 
experimental basis to a number of pupils in various locali- 
ties. Questions that prove to be ambiguous are eliminated, 
and questions that seem to have little or no power to differ- 
entiate between high-scoring and low-scoring pupils are re- 
moved. The reconstructed test is th 


en given to several hun- 
dred pupils, 


and the norms are established. Medians are 
perhaps most commonly used for this purpose, since scores 
typical of pupils at various grade levels are determined. This 
typical score for each grade is taken to be the norm or stand- 
ard for that grade, and the test is said to be standardized. 
(Standardization will be dealt with in greater detail in Chap- 
ter 4) A. standardized test has the advantage of adding ob- 
Jectivity to the interpretation of the Score. But it would be a 
mistake to think that the norm represents a standard in the 
Sense that it is a goal to be achieved by individual students or 
a level of accomplishment with Which to be satisfied in the 
case of others. Nor should the attractiveness of a standardized 
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test lead the teacher to repudiate teacher-made tests, which 
have a place in a balanced program of evaluation. Teacher- 
made tests can be made to fit the local situation better; they 
can more pointedly refer to what has been taking place in a 
particular classroom; and they are economical devices for 


practice and drill. 


SUMMARY 


Tests are instruments for "sizing up" pupils. They yield 
" based on sampling techniques, which 
can be used (when properly interpreted) to evaluate the moti- 
vation, ability, and growth of individual pupils. As yet there is 
no single test which provides enough clues to evaluate the 
“whole child.” Hence, a balanced program for sizing up pupils 
will include a variety of tests which differ in mode of con- 
Struction and in purpose. Used as a means to better eval- 
uation, tests can save the time and energy of teachers and 
pupils, just as samples taken by diamond drills save the time 


and energy of the miner. 


clues, or “measures, 


STUDY AND DISCUSSION EXERCISES 


1. How does the growth principle which states that “growth is 
a product of the interaction of the organism with its environment” 
bear upon the measurement concept in education? 

2. What could the classroom teacher do to make test taking a 


less emotionally upsetting experience? 
3. In your own words, distinguish between evaluation, or ap- 
Praisal, and measurement. 
4. List a number of things that 
5. How do you think a classroom teac 
tageously use a personality inventory? . 
6. In what ways might a projective technique be superior to a 


personality inventory? 


tests do not do. 
her might most advan- 
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ventories and their uses in guidance. The author observes that 
with changed environmental circumstances there is a chance 
that the IO will change. 
Educational Test Bulletin, no. 1, *How Tests Can Improve Your 
School," Hollywood, Calif.: California Test Bureau. 
This bulletin may be obtained free by teachers who submit 
their titles and grade assignment. Starting with a brief state- 
ment of the purposes of education, the authors show how tests 
can facilitate the achievement of those purposes. The use and 
Cost of tests is briefly indicated. 
Ross, C. C.: Measurement in Today's Schools, 3d ed. (revised by 
J. C. Stanley), Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1954, 
pp. 3-59. 
In these two chapters the authors discuss the place of measure- 
ment in the modern school and in the modern world, give a 
brief background of the testing movement, and describe some 
of the difficulties that have been encoun 
ing scientific, 
Rummel, J. Francis: 
State Department of E 


tered in making test- 


Know Your Pupils, Salem, Ore.: Oregon 
ducation (no date), 37 pp. 

This booklet deals with the meaning of evaluation and the in- 
Struments and methods for facilitating evaluation. Some valu- 


able suggestions for interpretation of data, together with pre- 
cautions, are given. 


Wrightstone, J, Wayne, Josiah Justman, and Irving Robbins: 
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Evaluation in Modern Education, New York: American Book 
Company, 1956, pp. 3-59. 
The meaning of evaluation and measurement, the historical 
background of evaluation, recent trends, steps in evaluation, 
types of evaluative devices, and characteristics of good test in- 
struments are discussed. 


CHAPTER TWO 


How to Identify a *Good" Test 


As we have seen, tests are used to evaluate certain personality 
characteristics or increments of growth. Some provide better 
bases for evaluations than others; hence tests themselves have 
to be evaluated. Since our purpose in testing is to see the 
pupil more clearly, we must use those instruments that facili- 
tate clear vision. A camera or microscope with a scratched, 
clouded, or imperfect lens gives a distorted picture. So, too, 
a faulty, poorly constructed test provides a distorted picture of 
the individual whom we wish to understand better. This chap- 
ter is designed to help the teacher understand what charac- 
teristics of tests he should Consider in selecting the instru- 
ments which are to promote improved pupil behavior. 


» Willingness to take orders, or 
Personal cleanliness. As a result of the “halo effect,” or the 


tendency to believe that a child who is cheerful, clean, and 
14 
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cooperative is also bright, teachers are often surprised when 
a quiet or sullen lad achieves a high score on a test or when 
a blond, curly-headed little girl in a starched pinafore makes 
a low score. The element of subjectivity often leads to mis- 
taken evaluations. 

If a test is to increase objectivity, however, it must itself 
be objective; in it, the exercise of the teacher's judgment must 
be reduced to a minimum. The answers to questions must be 
easy to interpret as either right or wrong and must leave little 
or no occasion for the teacher to say, “I’m sure he had the 
right idea in mind; I'll give him credit.” Objectivity is in- 
creased by the use of short-answer test questions such as 
simple-recall (in which one word will correctly answer the 
question), true-false, multiple-choice, and matching ques- 
tions. If the material of these short-answer questions were dealt 
with in a hundred-word composition, the possibility of objec- 
tivity in scoring would be decreased; that is, different scorers 
would assign different values to the answer. However, even in 
the essay answer it has been found that objectivity is increased 
through the use of pre-formed model answers. Some intelli- 
gence tests provide such model answers, assigning varying 
weights, or values, to correct but differing answers. For ex- 
ample, some items on individual intelligence tests may be 
answered in several ways, but the sample answers provided 
in the manual give the test scorer some basis for scoring re- 
sponses. By and large. the classroom teacher achieves objec- 
tivity by using the scoring key provided with the tests and by 
carefully adhering to printed directions for administration, 


scoring, and interpretation. 


The Criterion of Validity 

e an accurate picture of some 
tant that they 
easure. This 


Since tests are designed to giv ; 
aspect of the personality of the pupil, It Is mpor 
actually measure what they are designed to m 
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characteristic of tests is called validity. A test is valid when 
the results it obtains correspond closely with those obtained 
by means of other criteria evaluating the same trait. 

The meaning of validity may be made clear by describing 
an invalid "arithmetic" test which consisted of twenty items, 
each accompanied by a lengthy discussion intended to clarify 
the problem. Many of the pupils did ten of the twenty prob- 
lems accurately but failed to finish the entire test in the time 
allotted. Study of the results indicated that the pupils’ arith- 
metic scores corresponded very closely to their scores on read- 
ing tests. It was suspected that the test actually tested reading 
as much as arithmetic. When the test was redesigned and the 
reading material substantially reduced, the subsequent scores 
on arithmetic were much less similar to the pupil's reading 
Scores, and the results of the second test corresponded more 
closely with the pupils’ work in arithmetic classes. Thus the 
second test was considered to have greater validity. (Standing 
in arithmetic class was the criterion by which the validity of 
the test was judged.) 

The concept of validity is further illustrated by the fact 
that intelligence tests are considered to be valid when their 
results correspond to scores on other recognized intelligence 
tests, to the pooled judgment of a panel of experts who know 
the examinees, to an estimate of the subject's intelligence as 
inferred from measures of his adjustment in various situations, 
or to the subject's achievement in school. The approach to va- 
lidity from the standpoint of adjustment to life and success in 
School achievement is illustrated by L. M. Terman’s follow- 
up studies of gifted children,’ Subjects tested and studied 
in about 1921 were restudied after approximately twenty-five 
years. They were found, rather uniformly, to have been suc- 

* Lewis M. Terman and Melit 


N a H. Oden, The Gifted Child Grows 
Up, Genetic Studies of Genius, 4, Stanford, Calif.: Stanford Univer- 
sity Press, 1947. 
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cessful in schoolwork, to have achieved enviable occupational 
status, earned superior incomes, established stable marriages, 
developed a variety of constructive avocational interests, and 
in many ways to be well on the road to eminence as partici- 
pating citizens and personally effective individuals. Hence the 
tests are considered to be valid from the standpoint of the 
"test of living." 

It is advisable at this point to explain the concept of the 
coefficient of correlation, which makes it easier to compre- 
hend some of the characteristics of a good test. A coefficient 
of correlation (or of validity, or reliability) is a number which 
indicates the dependability of the predictions that are made in 
terms of that number. Wind direction can always be inferred 
from the direction of the smoke drift; the correlation between 
the two is 1.00, or perfect. The volume of a gas is inversely 
Proportional to the pressure exerted upon it, temperature re- 
maining constant; the coefficient of correlation is — 1.00. Re- 
lationships among human traits lie somewhere between these 
two extremes. The correlation between two valid and relia- 
ble measures of intelligence would probably be about .90. The 
correlation between intelligence and reading would be some- 
Where in the vicinity of .50 or .60 (read this "point five zero" 
Or "point six zero"). The correlation between size and intelli- 
gence would be positive but so low as to make predictions 
for individuals extremely dubious—about .10 to 20. These 
figures are not percentages. They are figures which indicate 
how much credence can be placed in predictions based on 


that particular coefficient. : 
Table 1 should be read somewhat as follows: A coefficient 


of correlation of .90 increases forecasting efficiency by 56 per 
Cent over pure guess and provides 78 chances out of 100 of 
Predicting correctly from one measure the approximate level 
Of performance on another measure. There are 22 chances out 
of 100 that the prediction will be incorrect. Even a coefficient 
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TaBLE 1* 
Chances in 100 of 
predicting — at-or- 
Percentage increase above, and below 
Correlation in predictive average in future 
coefficient efficiency behavior 
0.00 0.0 50-50 
0.10 0.5 50.25-49.75 
0.20 2.0 51-49 
0.30 5.0 52.5-47.5 
0.40 8.0 54—46 
0.50 13.0 56.5-43.5 
0.60 20.0 60-40 
0.70 29.0 64.5-35.5 
0.80 40.0 70-30 
0.90 56.0 78-22 
0.95 69.0 84.5-15.5 
0.98 80.0 90-10 


* Clifford P. Froehlich and John G. Darley, Studying Students— 
Guidance Methods of Indivia 


dual Analysis, Chicago: Science Research 
Associates, Inc., 1952, p. 54. 


of correlation of .99 is onl 
(The chances in 100 of p 
puted by adding the per- 
to 100 and dividing by t 
aware of a noticeable m 
of measures involving t 
speaking, 
lows:? 


y 95 per cent better than chance. 
redicting future behavior are com- 
cent increase of predicting efficiency 
WO.) Thus the user of a test must be 
argin of error in the predictive value 


he concept of correlation. Generally 
coefficients of correlation may be viewed as fol- 


.00 to .20 denotes indifferent or negligible relationship; 
-20 to .40 denotes low, slight correlation; 
*Henry E. Garrett, Statistics in P. 


sychology and Education, New 
York: Longmans, Green & Co., Inc., 1947, p. 333. 
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-40 to .70 denotes substantial or marked relationship; 
-70 to 1.00 denotes high to very high relationship. 


. The coefficient of correlation may be used to give a numer- 
ical indication of the validity of a test. If the test were per- 
fectly valid, the coefficient of validity would be expressed by 
the number 1.00. Each pupil would have the same rank in 
Schoolwork as on the test, if rank in schoolwork were the cri- 
terion by which the validity of the test was judged. In other 
words, with perfect validity (1.00), an individual's rank would 
be the same on the test as in a ranking by experts or by school 
marks. Actually, this does not happen in practice, but the 
amount of shift in relative position is relatively small in a 
highly valid test. This is illustrated in the following tabula- 
tions: 


Subject Rank on test Rank in schoolwork 
A 1 1 
B 2 3 
S 3 2 
D 4 5 
E 5 4 


According to some methods of computing correlations, the 
Coefficient in the above illustration (between the test and the 
criterion of rank in schoolwork) is .80, a fairly typical co- 
efficient for mental tests. If the rank in schoolwork of pupils 
B and C and of pupils D and E were reversed, there would be 
Perfect correlation between test results and rank in school- 
Work. The closer the coefficient of validity is to plus 1.00, i.e., 
the closer the test is to the criterion, the better it measures 
What it purports to measure. In selecting tests the teacher 
Should consult published reviews and catalogues of the test 
to learn the basis on which the validity of the test was estab- 
lished (the criterion of success) as well as the validity coef- 
ficient claimed for the test. Typically, validity coefficients are 
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lower than reliability coefficients, so differences between the 
two need not make the selector of tests apprehensive. Al- 
though the coefficient of validity should be as high as possi- 
ble, it need not be as high as .70 or .80. Lee J. Cronbach 
has cited tests used in making military classifications which 
had coefficients as low as .45 but which were useful in pre- 
dicting performance in specified military activities.’ 


The Criterion of Reliability 


In addition to being valid, a good test must also be depend- 
able for consistent measurements. The rubber rule that fish- 
ermen are reputed to use is an unreliable measure. The plat- 
inum measuring stick at the U.S. Bureau of Standards, pro- 
tected from the air and maintained at a uniform tempera- 
ture, can be depended upon to yield the same measurement 
year after year. The consistency with which a test measures 
whatever it does measure is called reliability. Even a very 
good test is less reliable than the platinum standard; but if 
similar scores result from administering a given test twice to 
the same individual with an interval of two or three days 
between, the test is considered to be reliable. In practice, the 
same test is not usually repeated, since practice affects scores. 
Hence reliability may be determined by the consistency with 
which equivalent forms of a test give comparable results. For 
example, an intelligence test is reliable if the subjects make 
à score within five points of their other scores on equivalent 
forms. If, however, the scores of half the students vary more 


than fifteen points from their first score, the measure would 
be considered unreliable. 


Reliability, like validity, 
efficient. Equivalent forms 
coefficients of correlation a 


is numerically indicated by a co- 
of the test are administered, and 
te worked out for the two forms. 


"Lee J. Cronbach, Essentials of Psychological Testing, New York: 
Harper & Brothers, 1949, pp. 252-253. 


HOW TO IDENTIFY A “GOOD” TEST 21 


The coefficient in this case is called the coefficient of relia- 
bility. For a given test, this index number will be found in 
the manual and in published reviews. If the publisher of the 
test does not indicate the coefficient of reliability, it is prob- 
ably so low that advertising it would do little good. Hence, 
teachers would do well to use tests for which the indicated re- 
liability is somewhere between .80 and 1.00, remembering, 
of course, that it will never actually be 1.00. 

Each of two or three different tests of intelligence may be 
reliable although their results do not concur. Thus, three tests 
with reliabilities of .82 or better were given to one subject 
with the following results: test A—IOQ 92, test B—IO 115, 
test C—IQ 123. However, on equivalent forms of each of the 
three tests, the scores of the same individual varied no more 
than seven IQ points. This observation is made to impress 
upon teachers that it is important to indicate what test is be- 
ing referred to when an IO is reported. It is also important 
to know that scores on a test in which there is a large non- 
verbal factor (test C above) will often vary markedly from 
Scores on a test in which language facility is an important fac- 
tor. In short, a test that is reliable in itself may seem unre- 
liable when it is compared with another type of test. 


The Criterion of Comparability 


Another desirable characteristic of tests is comparability. 


If the teacher is to gain anything more from the test than 
a knowledge of the status of a pupil at the time he took the 
test, which is helpful but limited information, he must use 
tests which are comparable. This will make it possible to see 
how much the pupil has grown in a given period of time. 
ch shows how specified percentile 


cients or reliability. 
New York: Apple- 


* Leona E. Tyler cites a table which st 
scores may be interpreted in terms of given coeffi 
See Leona E. Tyler, The Work of the Counselor, 
ton-Century-Crofts, Inc., 1953, p. 117. 


-aa 
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That is, a test is given at the beginning of a unit of work or 
a semester and a comparable (equivalent) test is administered 
at the end of the period. The difference between the scores 
indicates how much the pupil has grown during the period. 
If the tests are not comparable, they present a distorted view 
of the pupil. For example, a reading test with supposedly com- 
parable forms was administered by two teachers. One gave 
form A first and the other gave form B first, and each gave 
the alternate form at the end of an intensive reading-instruc- 
tion program. The average gain in pupil scores by the teacher 
giving form A first was 30 percentile points, whereas the av- 
erage change in pupil scores by the teacher giving form B 
first was a loss of five percentile points. The order in which 
the tests were given by each teacher was reversed in another 
trial on different groups of pupils, and this time the pupils of 
the second teacher made remarkable gains and those of the 
teacher who had been so successful on the first trial showed 
à slight loss in average percentile standing. Obviously forms 
A and B were of different degrees of difficulty—the tests were 
not comparable. 

Comparability is usually achieved by the use of what are 
known as equivalent forms of a test. Most tests have two 
equivalent forms, and some tests have three or four forms. 
The teacher who wishes to know what progress his pupils are 
making will want a test with at least two equivalent forms. 
Two forms of the same test may be prepared by what is 
known as the split-half method. A long test is given, and an 
item analysis is made; the test is then split into two parts, 
each part containing the number of questions that were missed 
by, say, 25, 30, or 35 pupils. For each question missed by à 
certain number of pupils on the trial run of the test, a parallel 
item is included, worded differently. A carefully designed test 
with equivalent forms would yield like results on successive 
days, when administered to the same pupils, if it were not for 
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"practice effect." Some test manuals instruct the teacher to 
subtract a given number of points from the second test to 
correct for practice effect and hence obtain comparable re- 
sults. Regardless of whether form D, C, B, or A is given first, 
practice effect should probably be considered in interpreting 
the score on the second test. 


The Criterion of “Sampling” Adequacy 


If a test is to reveal how much a pupil knows, it must sam- 
ple adequately; that is, it must contain enough questions to be 
truly representative. Let us assume that two pupils are being 
tested in geography. One of them knows one of the ten items 
of information on the test and the other knows nine of the 
ten. A test of two items is given. If it includes the one item 
the first pupil knows, he will get a score of 50 per cent; and 
if it happens to include the one item the second pupil does 
not know, he, too, will get 50 per cent. Yet one knows nine 
times as much as the other. Warped views of test subjects can 
be avoided by adequate sampling, i.e-, by including enough 
items so that the gambling chance is minimized. Adequacy of 
Sampling is achieved by including enough items so that addi- 
tional items no longer seem to influence the score of the indi- 
vidual. At the same time, tests usually include only enough 
items to minimize the chance factor, for additional items do 
Dot seem to add to the accuracy of the test. Some tests, how- 
ever, have a limited usefulness in spite of inadequate sam- 
Pling. One intelligence test, for example, consists of fifteen 
items. In the hands of a clinical psychologist it possesses the 
advantages of a rough but rapid screening device. But persons 
Who do not fully appreciate the handicap of limited sampling 
might easily form an erroneous opinion of the test subject on 
the basis of this test. 


Some standardized tests hav 
Short forms, The longer form is use 


e what are known as long and 


d when there is plenty of 


24 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


time for administration. However, experimental administra- 
tions have indicated that the short form samples widely 
enough so that there is little difference in its reliability as com- 
pared with the longer form. Using fewer questions than are 
provided in the short form leads to such variability in results 
that further reduction in the number of items is considered 
to be inadvisable. Using more items than the long form in- 
cludes does not give more consistent results; rather, the addi- 
tional items are subject to the law of diminishing returns. 


The Cost Factor 


The use of tests in obtaining a clearer view of the abilities 
of pupils is sometimes limited by cost factors. School-board 
members and administrators often feel that the cost of tests 
is prohibitive. It is therefore desirable to get the least expen- 
sive tests available for the advantages derived. Fortunately, 
the most costly tests are not always the most dependable from 
the point of view of adequacy, reliability, or validity. Many 
tests that are relatively inexpensive on a per-pupil basis give 
quite valuable clues to understanding pupils. Expense is fur- 
ther reduced by the provision of scoring sheets. Thus, the 
same test booklet can be used over and over by inserting new 
answer sheets. This makes it possible to reduce the financial 
outlay to a penny or two per pupil once the booklets have 
been purchased. 

It is hard to generalize about monetary costs. As the 
teacher examines tests, he should study the various test 
catalogues to see how much a package of twenty-five tests will 
cost, whether the manual of directions is included in the price, 
and whether or not there are Separate answer sheets available. 
The price tag is not a highly dependable criterion, because 


one must consider that the objective is to get the best test for 
the price paid, 
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Nor is the question of economy limited to a consideration 
oo outlay alone. Good tests should be economical of 
v = ds time—that is, the results should be easily scor- 
e . Pre erably, the teacher should be able to score a test 

mply by counting the correct or incorrect responses. Little 
computation should be required, for computation not only 
takes time but provides a chance for error to creep in and 
thus reduces the reliability of the test. Economy of time should 
S be considered in giving the test. Specifically, the direc- 
i i should be easy to understand and easy to explain clearly 
oe B pupils. Examination of the manuals of directions of 

l ifferent tests of the same knowledge or ability will in- 
dicate that they differ widely with regard to the ease with 
Which they can be understood and administered. 


The Test Manual 
mum usefulness to the teacher, the 


Pis should include a manual of directions. This manual 
nw do the following specific things: (1) It should explain 
Specific advantages, features, and purposes of the test. 
(2) It should explain the process of its standardization so 
that the teacher will know how much confidence can be 
Placed in the results obtained. (3) It should give clear and 
COncise directions for administering the test. The total time 
to be allowed and the time allotment for individual parts 
Should be clearly indicated. (4) Even if the scoring pro- 
cedures seem perfectly obvious, they should be described in 
detail. (5) Considerable space should be devoted to an in- 


etPretation of the scores. The meaning of specific scores in 
erms of average grade placement, quality of work, or com- 


Parative standing in a group should be indicated. (6) Sugges- 
lons should be made for the intelligent use of the results in 


Pupil guidance. 


In order to be of maxi 
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SUMMARY 


Tests differ widely in the care with which they are con- 
Structed, and they are not of equal value in achieving an ob- 
jective view of the pupils. The relative worth of tests can be 
judged in terms of their objectivity, validity, and reliability. 
The better tests are economical of time, effort, and money; 
they sample widely; and the manuals that accompany them 
are sufficiently detailed to indicate exactly how the test should 
be viewed, administered, scored, and interpreted. Tests that 
meet these criteria will be of real help to the teacher in estab- 
lishing realistic but growth-inducing goals for pupils. 


STUDY AND DISCUSSION EXERCISES 


1. Does the desirability of objectivity imply that the teacher's 


judgments have no place in evaluation and appraisal? Explain your 
answer. 


2. Evaluate this statement: 
being valid. 

3. Can a test be valid without being reliable? 

4. What kind of predictions can be made on the basis of a coef- 
ficient of .20 (e.g., the coefficient of correlation between size and 
intelligence)? 

5. Are there test situations in which the f. 
ity is not important? 

6. What are some reasons 
valuable than a 10-item test? 

7. Class exercise: Stud 
determine its worth, 


A test may be reliable without 


actor of comparabil- 
why a 100-item test might be more 


y copies of a test manual and attempt to 
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This chapter, “The Criteria of a Good Examination,” deals 
with validity, reliability, adequacy, objectivity, administrability, 
* scorability, comparability, economy, and utility of tests. 
1 oss, C. C.: Measurement in Today's Schools, 3d ed. (revised by 
. C. Stanley), Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1954 
pp. 106-135. 
This chapter deals with the characteristics of a satisfactory 
measuring instrument. In addition to defining the terminology 
used in this discussion, the author makes suggestions on the uses 
and limitations of tests. 
fodit. Phillip J.: “Validity of Educational Tests,” Test Service 
oe no. 3, Yonkers, N.Y.: World Book Company, Division 
Test Research and Service, 1947, 4 pp. 
This free leaflet discusses validity in terms of the objectives of 
education and the criterion of behavior and describes how tests 
are designed to obtain greater validity. Relative merits of short- 


answer and essay tests are discussed. 
Trow, W. C.: Educational Psychology; 
Mifflin Company, 1950, pp. 326-365. 
A discussion of the terminology used in testing procedures. 
Reliability, sampling, correlation, validity, objectivity, and types 
of questions are treated. Suggestions for summarizing test re- 
sults are given. 
Wrightstone, J. Wayne, Josiah Justman, and Irving Robbins: 
Evaluation in Modern Education, New York: American Book 
Company, 1956, pp. 16-28. 
In addition to the objectives 
ter deals with the relation O 
curriculum. 


2d ed., Boston: Houghton 


gram, this chap- 


f evaluation techniques to the 


for a testing pro 


CHAPTER THREE 


Choosing the Right Test 


If there were one best test for evaluating any one pupil trait 
or ability, there would be no problem involved in test selec- 
tion. The one best test would have proven itself, and custom 
and common practice would make it widely known. However, 
this “happy” situation would necessarily impose some limita- 
tions. For example, the test would have to be widely appli- 
cable; it would not be adaptable to the problems of a particular 
School system. Users would be obligated to pay whatever 
price was charged for it. The publisher of the test could prob- 
ably afford to be slow in rendering service to users and might 
even be loath to make any changes in a test so widely ac- 
cepted. The necessity for test selection carries with it some 
advantages, then, not the least of which is the obligation of 
teachers to investigate tests and thus learn more about their 
nature, limitations, and advantages. 

In order to select tests wisely, it is necessary to (1) define 
the purposes of the testing program, (2) select the areas to be 
tested in the light of the purposes, (3) apply the criteria of 
good tests to the tests available in the selected areas, and (4) 
evaluate the tests used in the light of experience as a basis 
for future selection. These steps will be examined in detail 
in this chapter. 


28 
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Capitalizing on Group Wisdom 


S od there is an opportunity involved in the task of test 
Pon, several teachers should be given the privilege of 
participating. It is advantageous to bolster the wisdom of one 
or two individuals with the suggestions and advice of others, 
and pooling this knowledge will also help teachers to realize 
the values and shortcomings of tests as well as the problems 


involved in using them. 

It has proved advantageous 
Select jointly the tests that will be most 
locally. This is sometimes done in the larger systems by h 
à committee formulate the objectives of the testing program 


and choose the tests in the light of the purposes they are to 


serve. In smaller systems all teachers might well be involved 
d problems. The 


In the preliminary discussion of purposes an 
group approach is especially advantageous when different 
forms of the same tests can be used in several consecutive 
grades, thus gaining the advantage of the criterion of com- 
Parability discussed in Chapter 2. The sixth-grade teacher, 
for example, can then compare the score of a pupil in his 
grade (on an alternate form of the test selected) with the 
Score obtained by that pupil on another form of the same test 


When he was in the fifth or the fourth grade. If different tests 
are used, although both are valid and reliable, the results may 
res are unwittingly compared, 


n M 

he be comparable, and if the sco ; 
‘ne comparison will be misleading. Thus teachers administer- 
ing both the Metropolitan and Towa achievement tests have 


um that the class average is aS much as half to a full grade 
igher on one test than on the other. The scores of individual 
on the two tests. Both 


a vary as much as 2 full grade 
esis ate acourate, reliable; and valid: they were simply not 


standardized simultaneously for comparison. 
The group approach also has 


for the teachers in a school to 
profitable for them 
aving 


purposes of 
the advantage of capitalizing 


S 
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upon the experience of various teachers with different "m 
Pupil reactions and typical problems in administration can be 


anticipated, many of which may not be described in the test 
manual. 


DETERMINING THE PURPOSES OF THE 
TESTING PROGRAM 


Whether tests are selected by each teacher individually for 
his own class or by a group of teachers for a number of grades, 
the first step is to determine objectives—to decide what the 
testing program is designed to do. Let us assume that the list 
would include some or all of the following objectives: 


To gain insights into the social facilit 
dividuals 


To obtain information about their a 
general 


y of the pupils as in- 


ptitudes for learning in 


To obtain information about their specific talents or relative 
strengths 


To discover their present status in subject-matter achieve- 
ment 


To find clues to Ways of overcoming specific weaknesses 
To motivate pupils to 


Put forth consistent and serious 
effort 


This list of Objectives su 
not as ends in themselves 
learning and instruction. 
tate each child's growth, 
corroborative data and as 
vations.' An experienced t 


ggests that the tests are to be used 
but as clues to the improvement of 
They are devices designed to facili- 
and the results are to be used as 
Supplements to the teacher's obser- 
eacher will know without being told 


"This point can hardly be Overemphasized. The teacher sees the 
total functioning of the child 


tion. The fi 


» Dot just his functioning in a test situa- 
ollowing statemen: 


t by a specialist in vision illustrates this: 
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that the tests that are most appropriate in the fifth grade are 
not necessarily those that provide valuable clues in the first 
grade. In formulating objectives, the group must take into 
consideration the fact that the testing program is a whole- 
School program. We shall therefore approach the problem of 
selecting tests in terms of a minimum program for the primary 
grades, the intermediate grades, and the upper grades. 


SIGNIFICANT AREAS TO BE TESTED 


It might appear that the most important test in every grade 
is the intelligence test. However, as we shall see in a later 
chapter, rate of intellectual growth has not become steady 
enough in the first grade to make the intelligence test a wholly 
reliable indication of growth. Giving intelligence tests at this 
level involves the risk that a child may be branded as unin- 
telligent because he does not understand the directions ina 
group test or because for some reason it is difficult for him to 
maintain interest. The results of one type of test, however, 
are much less likely to be stigmatizing after a year or two: 
the reading-readiness test, which many teachers prefer to 
the intelligence test. This device has the advantage of dealing 
rather specifically with an aptitude that is of immediate prac- 
tical importance to the teacher and the pupil. The results are 
directly applicable to the question of whether reading experi- 
ences should be initiated at once or W 
to spend time on a developmental readiness progra 


hether it would be better 
m. An IO 


"Don't mistrust your own observation about a child. Even a competent 
Oculist or ophthalmologist does not have the advantage of seeing symp- 


toms of visual difficulty after the child has been using his eyes for a 
far as they go. The 


Prolonged period. His tests may be very good. 29 ; 
teacher sees the child in a functional situation.” Evaluation of the 
Child's vision is parallel in this sense to the evaluation of his intelli- 


gence or social adaptation. 
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will not yield this information; an MA would come closer to 
telling the teacher what he needs to know. 

Intelligence tests may be used, with due reservations, in the 
primary grades, but some teachers prefer to wait until the third 
grade or even the intermediate grades. Selection of a test at 
this level, however, should be influenced by the experience of 
the teachers involved. If none of the teachers in the group has 
had any experience with the tests, it is wise to apply to a 
teacher in another system for suggestions. A letter addressed 
to the superintendent of schools of a city system would be 
turned over to competent persons ( perhaps specialists in test- 
ing) who would be willing to make helpful suggestions. 

After some tests have been suggested, the group should 
obtain the description and evaluation of these tests from the 
basic reference book. Oscar Krisen Buros’ Mental Measure- 
ments Yearbook. The value of this work can be partially de- 
termined by reading a representative entry such as the follow- 
ing:? 

[255] 
Pintner General-ability Tests: Verbal S 
4-9, 9+; 1923-46; 204 per manual 
a) Pintner-Cunnin 
Forms A, B, C; 


eries. Grades kgn-2, 2-4, 
; World Book Company. 
gham Primary Test. Grades kgn-2; 1923-46; 


$1.45 per 25; 35¢ per specimen test; Rudolf 
Pintner, Bess V. Cunningham, and Walter N. Durost. 


b) Pintner-Durost Elementary Test. Grades 2.5-4.5; Scales 1 
(requires no reading) and 2 (requires reading) may be used sep- 
arately or together; Forms A, B; $2.00 per 25 Scale 1; $1.60 per 


25 Scale 2; 35; per specimen set; (45) minutes per scale; Rudolf 
Pintner and Walter N. Durost. 


C) Pintner Intermediate Test, Grades 4.5 
of Pintner Intelligence Test; Forms A, B 
specimen set; $1.20 per 25 machine 
(35) minutes; Rudolf Pintner. 


* Oscar Krisen Buros (ed.), The Third Mental Measurements Year- 
book, New Brunswick, N.J.: Rutgers University Press, 1949, p. 334. 


—9.5; 1931—42; revision 
; $1.70 per 25; 356 per 
-scorable answer sheets; 45 
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d) Pintner Advanced Test. Grade 9 and above; 1938—42; Forms 
A, B; $1.70 per 25; 35¢ per specimen set; $1.20 per 25 machine- 
scorable answer sheets; Rudolf Pintner. 


Following these basic data are reviews and evaluations of 
this test by competent reviewers. Excerpts from two of these 
reviews follow:* 


Stanley S. Marzolf, Professor of Psychology, Illinois State Normal 


University, Normal, Ill. 
The reliabilities obtaine 
Ods for the various batteries are, 
cess of .90. Sources of reliability data are given in all instances. 
Standardization has been based on *approximately 100,000 
tests from widely separated parts of the country." Further collec- 
tion of scores for normative purposes is now in progress. 
The computation of deviation IOs is amply explained and il- 
lustrated. For the Intermediate and Advanced Tests a monograph 
Which facilitates computation of IQs and centile equivalents is 
Provided. 
This series is one of the best available for school use. The tests 
are easy to give and score. Raw scores are easily converted to a 
Normative form. The same score system—standard score, mental 
age, and deviation IQ—is used throughout the series. The attempt 
to make the tests comparable at all grade levels is commendable, 
even though empirical evidence that this has been accomplished 


is lacking. 


d by the split-half and interform meth- 
in the majority of cases, in ex- 


D. A. Worcester, Chairman, Department of Educational Psychol- 
98y and Measurements, The University of Nebraska, Lincoln, Neb. 

The intermediate and advanced tests each have eight subtests. 
All are timed but with limits so liberal, intentionally, that they are 
Not to be considered as speed tests. The materials of the tests are 
9n the whole of the kind that one finds in most of the conventional 
Intelligence tests. 


Each test of the series has received careful statistical treatment 


* Ibid., p. 336. 
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and the statistical findings are given in the manuals. Norms for 
the tests are articulated with each other, making possible com- 
parable measures at the various age levels. Scores may be in- 
terpreted in almost any way which the user may wish: standard 
scores, ratio or deviation IOs, percentile ranks, mental ages, Or 
grade equivalents. Machine scoring is available for the inter- 
mediate and the advanced tests. While the task of administering 
these tests is somewhat greater than that for some of the tests con- 
structed more recently, there is evidence that they have been con- 
structed with care and may be employed with good results. 


In the Yearbook several hundred tests are listed, classified, 
and evaluated; hence the suggestions of experienced users of 
the tests can save time and prevent the possibility of the teach- 
er's becoming dismayed by a plethora of titles, addresses, and 
statistics. If the Yearbook is not available locally, it can prob- 
ably be borrowed from the state library or the state depart- 
ment of education. 

If it is not possible to get hold of the Yearbook, it is wise 
to obtain catalogues from the test publishers. Although this 
list should not be interpreted as endorsing any individual test, 
such addresses as the following will provide a starting point: 
Educational Test Bureau, 720 Washington Avenue, S.E., 
Minneapolis 14, Minnesota; California Test Bureau, 5916 
Hollywood Boulevard, Los Angeles 28, California; American 
Council on Education, 744 Jackson Place, Washington 6, 
D.C.; World Book Company, Yonkers 5, New York; Science 
Research Associates, Inc., 57 W. Grand Avenue, Chicago 10, 
Illinois; Psychological Corporation, 522 Fifth Avenue, New 
York 36, New York. 

The Yearbook and the addresses of publishers will, of 
course, be helpful in selecting all kinds of tests, not merely 
those relating to mental ability. 

Achievement tests are designed to indicate the pupil's pres- 
ent status regarding skill and knowledge in such subject-mat- 
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ter areas as reading, vocabulary, arithmetic fundamentals, 
arithmetic operations, spelling, English usage, etc. Advanced 
batteries for use in the upper grades and high school sample, 
in addition, such areas as literature, history. civics, and 
geography. Norms for achievement tests are typically de- 
Scribed in terms of age standards, grade placement, and per- 
centile ranks. Again, these norms (or averages) represent the 
scores of typical third, fifth, sixth, etc. graders and are not to 
be considered as standards for individual children to reach 
or excel. Nor should the teacher regard the norm as à stand- 
ard that should be achieved by his class this year. Achieve- 
ment scores must be interpreted in terms of indicated pupil 
potential; that is, the average on achievement tests of the class 
this year may be in part evaluated in terms of the average ob- 
tained on a test of mental ability. What is important is that the 
norms provide clues for evaluating the status and progress of 
individual pupils. The achievement test selected should pro- 
vide equivalent or comparable forms, since maximum utility 
will be obtained when the score @ pupil makes this year can 
be compared with his score of a year or two ago. If a dif- 
ferent test is used, even a test covering the same area, the 
feasibility of comparing them is materially reduced. 

If a particular pupil does not seem to be doing so well on 
achievement tests as his mental-ability test “promised,” an ex- 
planation may be derived from à diagnostic test. In such sub- 
ject areas as reading, arithmetic, and spelling. à diagnostic test 
Serves the purpose of locating rather specifically the difficulty 
the child is encountering. In reading, the difficulty may be a 
Weak vocabulary, lack of method in word attack, or lack of 
experience leading to interpretative ability. The test does not 
tell what should be done; it simply narrows the area of search 
for a constructive remedial program. Similarly, in arithmetic 
the diagnostic test will help one find a specific area of dif- 
ficulty, which might be a particular erroneous number com- 
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bination, such as 7 X 6 = 52, or failure to understand bor- 
rowing in subtraction. Or perhaps the pupil understands the 
processes but does not make the correct choice of operations; 
that is, perhaps he does not understand the written problem. 
In order to detect particular areas of difficulty, the diagnostic 
test is divided into distinct parts which employ specific opera- 
tions (addition, subtraction, etc.) and particular number 
combinations. 

Teachers often find that special-aptitude tests are of value 
in understanding the “whole” child. Tests of musical aptitude, 
mechanical aptitude, and art may be helpful in suggesting 
academic approaches which will permit the pupils to experi- 
ence a degree of success and thus become more strongly 
motivated. Language-aptitude tests, mathematics-aptitude 
tests, and vocational-aptitude tests are useful in academic and 
vocational counseling at the secondary school level. 

Teachers have traditionally been, and many of them are at 
present, concerned mainly with the academic adjustment and 
achievement of pupils. However, increasingly teachers are 
realizing that other phases of adjustment are of equal im- 
portance. In fact, personal and social adjustment may be of 
even more importance immediately than academic adjust- 
ment, because in order to function well in the academic situa- 
tion it is necessary that the pupil be as free as possible of per- 
sonal and social problems. There are several methods of per- 
sonality evaluation. One of the most readily available of these 
methods is direct observation of behavior. Naturally, some 
knowledge of child and adolescent Psychology, mental hy- 
giene, and abnormal PSychology contributes to make the 


“Personality includes all that a 
and abilities, achievements, and hi 
edge are a part of personality. Ho 
areas to be tested, we shall have 
security, confidence, and general so 


Person is and does—his capacities 
opes. Thus intelligence and knowl- 
wever, for purposes of discussion of 


reference here mainly to personal 
cial functioning. 
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teacher’s observations more penetrating and his evaluations 
f real value in 


more accurate. Rating schedules also are O 
narrowing the range of search for possible sources of dif- 
ficulty in personality adjustment. These schedules are of two 
kinds: one in which other pupils or the teacher rate a pupil 
in terms of given characteristics, and one in which the pupil 
rates himself in terms of given qualities Or reactions. Person- 
ality questionnaires are similar to rating scales except that in- 


stead of rating one's self on a three- or five-point scale, the 
ith “Yes,” “No,” or “Ques- 


subject answers the questions Wi 

tionable.” However, the results of formal questionnaires and 
rating schedules must be interpreted in terms of how the pupil 
behaves in the classroom and on the playground. 

Personality rating scales and questionnaires cover a variety 
of areas of functioning. Some deal with health attitudes, ethi- 
cal considerations, family relationships, and interpersonal ad- 
justments. These evaluative techniques are steadily increasing 


in number. Hence it is advisable that the teacher study the 
tests, obtain specimen sets, 
whether each 
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catalogues, tentatively select à few 
and then carefully read the manual to determine 


specific test fills the needs of his situation. 


APPLYING THE CRITERIA oF GOOD TESTS 


After deciding what areas are to be tested and after select- 
hould study the tenta- 


ing sample tests, the teacher group $ 
tively selected tests in each area from the viewpoint of the 


criteria of good tests. In order to discover the relative merits 
of the tests in the various areas. the significant data regarding 
each test should be summarized or tabulated on à check list; 
these data, it will be remembered, may be taken from test 
catalogues, the manuals of directions, OT the Mental Measure- 
ments Yearbook. A sample check list is shown in Figure 1. 


The check list obviously does not automatically select the 
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Average|1.5|35 | No 
Selecting tests. The explana- 
he list is as follows: 

1. High correlation with achievement in first grade. 


2. Measures readiness in Teading, arithmetic, and writing rather than 
in reading alone, 

3. Easier than most tests of its kind, 

4. High correlation with 
telligence test would 


intelligence m 
be unnecessary, 
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akes it probable that an in- 


. Excellent manual. Contains 


. Other aspects of intelligence be 
. Tests of spatial relationships, logical reasoning, 
. Takes somewhat longer than many O 
. Verbal and nonverbal scores. 


. Valuable because it involves less rea 


. Correlates well with school achievem 


. Ease of scoring is main feature. 
. Because of variation in courses © 


. Subtests cover all fundamenta 


. Covers items found in the typic 


. Superior and practical. 
- All the tests incorporate diagn 


. An attempt has been made 
- Not cited, but reviewers 4 
- Brief, simple, clear instructio 


- National norms should pro 


. Good screening device. Should be supplemented by observation 


and intelligence-test data. 


. Brief, easily understandable. Norms may be interpreted. without 


skill in statistics. Predictive value varies with methods used in 


teaching reading. 
bibliography valuable for teaching 


suggestions. 


Heavy stress on letter symbols involved in reading. 
sides "academic" aptitude are con- 


sidered. 
numerical reason- 


ing, and verbal concepts. 
ther tests because of break- 


down into part scores. 


ding than many other tests. 


Much of the explanation is in terms of the authors’ experience, 
which is, however, considered more than adequate by reviewers. 
ent particularly for group 
prediction. 

Considered by reviewers to be som 
vidual predictions. 


ewhat weak in terms of indi- 


f study, curricular validity must 


be determined locally. 
1 school subjects. Primary battery 
contains word and phrase recognition, word meaning, and num- 


bers. 
who wish to give the test in 


. There are partial batteries for those 


ted in the “Time” column. 


the shorter periods indica 
Must be interpreted 


al curriculum. 


in the light of local emphases. 


ostic items which should, however, 
h supplementary data. 


be used in conjunction wit 
to interpret scores in the light of block- 


promotion practices. 
ve been carefully 


ssert that the tests ha 


and competently constructed. . 
tions for interpre- 


ns to pupils. Direc! 


tation are clear and practical. 
bably be considered seriously only in 


the skill areas. 
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28. Gives percentile rank for adjustment in terms of home, school, 
health, and social areas for various school levels. 

29. User would need training in testing and mental hygiene. 

30. Can be used for rough psychiatric screening. 


31. The purposes of the test are inadequately disguised, and falsifica- 
tion of answers may be possible. 


32. Offers some remedial approaches which, though acceptable, are 
somewhat superficial. 


33. Subtests are: self-reliance, sense of personal worth, sense of 


personal freedom, feeling of belonging, withdrawing tendencies, 
and nervous symptoms. 


34. A coding system prevents subject from discovering the exact na- 
ture of the test. 


35. Calls attention of teachers to difficulties of adjustment of which 
they might otherwise be unaware. 


test. Each test seems to have some merits and some disad- 
vantages which are not found in others. Group discussion of 
relative values is advisable for the selection of the tests which 
are most appropriate for local purposes. 


Precautions 


New tests are published periodically. Some probe areas 
which have not previously been investigated, and others 
apply new or revised techniques in familiar areas. Because 
of the ample Supply of new instruments, two suggestions are 
pertinent. 

First, very often old tests are preferable simply because they 
have been previously used in the school. Teachers are familiar 
with them and are aware of their merits and shortcomings. 
Previous scores provide an Opportunity to develop local norms, 
and the scores made in the current year are more meaningful 
in relation to the national norms. Unless a new test has dis- 
tinctly superior features, it is often advisable to continue to 
use the older ones. 

Second, if a new test or test battery is selected, it is wise 
to continue the parallel use of the old test in the same area. 
The results can then be compared, and the gains or losses 


CHOOSING THE RIGHT TEST 41 


produced by the change can be evaluated. By the same token, 
the new test should be given an adequate trial. It should be 
used for a minimum of three years, and if it is then found to 
be satisfactory, its continùed use is justified. 


Follow-up Evaluation 


One aspect of the problem of test selection remains to be 
dealt with, even after the committee has chosen a given bat- 
tery and the tests have been administered: the tests should be 
discussed and evaluated in terms of the experience of ad- 
ministering them and making use of the results. Teachers find 
it helpful to discuss the problems they have encountered and 
get the suggestions of other teachers for overcoming these dif- 
ficulties. For example, such remarks as the following are likely 
to be made: “This test is too long for first graders.” “This 
achievement test does not parallel the suggested course of 
study for the state (or locality).” “This test is so short that I 
doubt that it samples adequately.” These remarks, however, 
should be viewed as precautions and limitations to be applied 
in interpreting test results, since all tests are likely to have 
their limitations. Some of the value of tests will be lost if a 
test is discarded because it does not fully sujt all users. It is 
probably better to put up with some shortcomings in a test 
rather than dispense with the values of comparability that re- 
sult from using the same test over a period of years. 


sUMMARY 


Choosing tests which will make a maximum contribution 


to the understanding of children and the improvement of in- 
Struction is a process which should involve many teachers. 
Even if the services of an “expert” are available, teachers 
should participate because (1) the work involved in choos- 


ing has educative value, (2) the school and the individual 
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teachers should gain the advantages of pooled experience and 
study, (3) teachers know more about local educational ob- 
jectives than experts, and (4) teachers know the needs of 
pupils in terms of their community background. 

The first task of the group is to determine as exactly as 
possible the purposes which the tests are to serve or to facili- 
tate. The kinds of tests should be named—intelligence, special 
aptitudes, achievement, personality inventories, etc.—and 
tentative lists should be suggested by capitalizing on the ex- 
perience of teachers on the staff or by contacting teachers or 
administrators outside the system. The group should obtain 
sample copies of the suggested tests, together with manuals, 
and then compare the tests, using the manual, catalogues, and 
the Mental Measurements Yearbook, in terms of the criteria 
of “good” tests. 

The process of test selection and evaluation does not end 
with ordering tests. It should be continued in the light of 
knowledge and experience gained during actual use of the 
tests. It should be remembered that, although selection is im- 
portant, the really crucial issue is the interpretation and use 


of data. This important consideration will be discussed in 
Chapter 13. 


STUDY AND DISCUSSION EXERCISES 


1. Explain how it is that two tests covering the same subject 
or ability may both be reliable and valid but not be comparable. 

2. Draw up a list of purposes of the testing program for the 
school where you teach or for some school with which you are ac- 
quainted. 

3. Look up some test with which you are acquainted in the 
Mental Measurements Yearbook and see what others think of its 
value. Is the report such that you would like to use or continue 
to use it? 


4. Explain the meaning of the statement, “Tests do not tell 
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what should be done." If they do not do this, then of what value 
are they? 

" Do you agree with the contention that the teacher's judgment 
às estimate should be used in evaluating the capacities of pupils? 
ued does this fit with the notion that evaluations should be ob- 
jective? 

" 6. Prepare a check list similar to the one presented in this chap- 
er. Add to it some tests in several areas and make a tentative 


selection of the best test in each area to fit your particular needs. 


7. Why is the follow-up evaluation of the test important? How 
t before sacrificing the 


long should one keep an unsatisfactory tes 
value of year-by-year comparisons? 


SUGGESTED ADDITIONAL READINGS 


Buros, Oscar K. (ed.): The Fourth Mental Measurements Year- 
book, Highland Park, N.J.: The Gryphon Press, 1953. 
ttee charged with 


This book should be available to every commi 
on. Older tests are reviewed and 


responsibility for test selecti: 
evaluated in previous issues of the yearbook. All tests to date 


4 are indexed. 
How to Select Tests," Educational Bulletin No. 2, Los Angeles: 


California Test Bureau, 1945 (free). 
This brief bulletin gives concrete advice on problems faced in 


test selection; it is illustrative of the good free material that is 


sometimes available from test publishers. 
Education, 


Jordon, A. M.: Measurement in 
. 14-39. 


Hill Book Company, Inc., 1953, pP a 
This chapter, «Characteristics of Measuring Instruments," de- 


fines and illustrates the qualities of good tests. A knowledge of 

the meaning of these qualities is basic to sound test selection. 
Traxler, Arthur E., Robert Jacobs, Margaret Selover, and Agatha 
Townsend: Introduction to Testing and the Use of Test Results 
in Public Schools, New York: Harper & Brothers, 1953, pp. 


20-29. 

In addition to describing the features of good tests, this chapter 

describes the assistance one can get on particular problems 
through catalogues, manuals, test-service agencies, and profes- 
sional literature. 


New York: McGraw- 


CHAPTER FOUR 


How Norms Help Us 
Size Up Pupils 


Evaluation of pupil characteristics is a pioneering and chal- 
lenging task because many of the human qualities related to 
educational goals are not readily measurable. It is relatively 
simple to measure height because it is quantitative and pro- 
gresses in regular units which we term inches. Weight and 
chronological age are equally easy to measure. For more com- 
plex qualities, however, such as personality traits, mental 
characteristics, and scholarship, we have no universally ac- 
cepted yardstick or scale. 

There was a time in history, of course, when length, weight, 
days, months, and years posed definite problems of measure- 
ment. There were no standards for comparison, no widely ac- 
cepted units of measurement in these areas. The task of per- 
sons interested in educational measurement has in recent 
years been similar to that of the individuals who, long ago, 
established such widely accepted units of measurement as the 
inch, the meter, the gram, the ounce, the hour, and the month. 
The task in educational measurement today is to establish 
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meaningful units for the measurement of human behavior in 
areas for which no adequate "yardstick" is yet available. The 
attempt to develop such measures has resulted in the wide- 
spread use of norms. 


NORMS: MEANING AND DERIVATION 


When someone tells us that a certain girl is five feet tall, 
we may think of five feet of linear height as an absolute meas- 
ure, or we may think of persons of our acquaintance who are 
five feet tall. In order to make use of such a measurement, 
we would have to know the age of the girl. She may, for in- 
stance, be short for an adult or tall for an eleven-year-old. 
Hence, the significance of simple quantitative measurements 
depends upon comparisons and relationships. 

In the area of educational measurements, there exist no 
absolute units such as the inch or the foot. A test score, for 
example, is comprised of responses to à number of test items. 
The items are not all exactly alike, as are standard measures 
such as inches; rather, they vary in nature and in difficulty. 

Let us say, for instance, that Jerry has a score of 43 on an 
arithmetic test. What does this score mean? Were there 43 or 
143 test items? Were the items related to addition, subtraction, 
multiplication, division, OT all of these? How difficult was the 
test? The score of 43 becomes meaningful only when it is 
placed in a framework which enables us to make comparisons. 
If we find that 43 is one of the highest scores achieved by 
members of Jerry's classroom group, the score takes on a 


value, for the important thing is not what score Jerry made 
on the arithmetic test, but how his score compares with other 
trument. The con- 


scores derived from the same measuring ins 
cept of norms is based upon such comparisons. Test scores are 
placed in a framework which helps us to relate scores to one 
another. Norms, then, are relative measures OT derived scores 
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designed to help the teacher interpret test results within a 
meaningful framework. 

When a test is administered and scored, the first result is a 
raw score, which is ordinarily the sum of the correct responses 
or the total of the values assigned to the several items included 
in the test. The raw score, like the score of 43 which Jerry 
attained on the arithmetic test in the foregoing example, has 
no definite meaning in itself. The teacher, like the test maker, 
faces the problem of giving meaning to the test score. 

The teacher who wishes to make Jerry's score more mean- 
ingful might ask this question: *How good is a score of 43 
on this test in my classroom? Is it average, better than aver- 
age, or below average?" Arranging the papers in order of 
size of scores from highest to lowest, he may use the middle 
paper as a point of reference. In this way he can evaluate a 
score of 43 as falling in the upper half or the lower half of 
the scores. Or he may establish a basis for comparison by 
finding the average score. In either case, he has established 
a point of reference which will give meaning to scores within 
this classroom group. 

These procedures are in many respects similar to those em- 
ployed by the professional test maker, who first establishes a 
sample or group as a basis for his reference scores and then 
administers his test to this group in order to study the results. 
To make the scores meaningful, he may select a score at the 
midpoint of the distribution of scores. This score is called the 
median, or fiftieth percentile, and it divides his distribution 
into two equal halves. Half the subjects scored above this 
point and half below. 

Again, the test maker may find the average, or mean, of all 
the scores to establish a reference score. The mean is the sum 
of the scores divided by the number of cases, and it represents 
typical performance for a group. The typical score made by 
pupils at a grade level is called the grade norm. The basis for 


HOW NORMS HELP US SIZE UP PUPILS 47 


the norm, then, is the midpoint (median) or the average 
(mean). In either case, the norm is a reference point presumed 
to represent the level of attainment typical of the defined group. 

Procedures very similar to this may be used to develop 
norms for any defined grade or age group or for any other 
special classification, such as school beginners, college fresh- 
men, or graduate students. The norm, then, is a reference 
point derived from a study of the scores of a selected group. 
This group is called the standardization sample and should be 
explicitly defined in the manual which accompanies the stand- 


ardized test. 
The sampling procedures are, of course, very important to 
or any group. The 


the interpretation of the results of the test f 
teacher will need to know whether the standardization group 


is in general similar to his group or in some significant way 
essarily equiv- 


different from it. Norms on two tests are not nec 
alent, since the groups from which representative scores, or 
norms, were derived may differ. For example, in a recent re- 
port, average IOs for the same pupils in one town varied from 
99.9 on test A to 107.2 on test D on four well-known intelli- 
gence tests. For this group of pupils, test D appears to be 
relatively easy as compared with test A. The norms for these 
two tests are, therefore, not directly comparable so far as these 


pupils are concerned. 

In general, in establishing in 
is the average or median attainme 
the pupils in the example above 
group which formed the origina 
least in ability to respond to test D. 

Although the discussion above is in many respects an over- 


simplification of the procedures involved in the establishment 


telligence test norms, IQ 100 
nt for each age group. Thus 
differed markedly from the 
1 standardization sample, at 


hurst, V. E. Herrick, and 
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of norms, it does appear to represent the basic principles and 
problems involved. The procedures in test standardization 
might be reiterated and expanded at this point as follows: 

1. Test questions are gathered and a trial form of the test 
is developed. 

2. The trial test is administered to a large group of pupils 
at the age or grade level for which the test is designed. 

3. A careful study of results is conducted in an effort to se- 
lect the best items, and the test is drawn up in final form or 
forms. 

4. A large group of individuals is chosen as the basis for 
the derivation of test norms. Problems of grade or age levels, 
typicalness or representativeness, and other basic problems of 
classification of the sample are considered as this selection is 
made. 

5. The test is administered to the selected group under 
standard conditions of directions and time allowances, as 
outlined in the test manual, and the scores are studied. Typi- 
cal scores for various special classifications, such as age or 
grade, are determined, and norms are established. 

6. Other problems which the test maker faces are: 


a. How valid is the test? 
b. How reliable is the test? 


c. How can scores from various forms of the test be 
made comparable? 


d. What kind of norms should be presented (age, grade, 
percentile, standard scores)? 


€. How can the results be interpreted and utilized? 
Information concerning these points is generally available in 
the test manual. 
Although many of the procedures listed above might be 
described in greater detail, it is clear that the standardization 
of a test is a long and arduous task requiring thought and a 


HOW NORMS HELP US SIZE UP PUPILS 49 


high degree of skill on the part of the test maker. It is clear, 
too, that norms have a very definite reference point—the 
group or groups that comprise the standardization sample. 
When the classroom teacher uses norms, he compares the test 
results of his pupils with the over-all results from certain spe- 
cific groups. The teacher can generalize the results of testing 
his group as a comparison with all pupils in a specific classi- 
fication only to the extent that the original group is indeed 
representative of the entire population in the classification he 
is using. 


MAKING USE OF AGE AND GRADE NORMS 


Age norms show the standing of the pupil by relating his 
test score to the score which is typical of a particular age 
group. Age norms are developed as follows: 

The test is administered to various age groups, such as chil- 
dren between the ages of 6 and 12 years. A typical score (the 
mean or median) is found for each age group and perhaps 
each half year of chronological age. Suppose the results are 
similar to those in Table 1 (page 50). The score column gives 
the total range of test scores On the hypothetical vocabulary 
test. Typical scores for each age group are represented by the 
heavily lined bars in each column. These may be the medians, 


or midpoints, or they may be the average scores for each age 


group. A score of 25, then, is the six-year norm, OF the score 
of 28 is the seven-year 


typical of the six-year group. A score 
norm, and so on. Ordinarily, tests utilizing age norms are $0 


designed that typical scores for lower age groups are lower 
than those for higher age groups. Thus a series of graduated 
age steps or levels is established. 

The crosses in Table 1 represent the distributions of scores 
for each age level. It will be noted that the scores vary around 


the norm; that is, not all six-year-olds achieve a score of 25 


50 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


TABLE 1. AGE NORMS FOR A VOCABULARY TEST 
Age in Years 


Score 6 7 8 9 10 Hi 12 mes 
50 T x 
49 XX 
48 | | XXX 
a X [xxx ] 
46 | XX XXXX 
45 [ [xx XXXX 
Lr] 1 X XXX XXXXX 
B | | [x XXX XXXXXX 
a XX XXXX _ [XXXXXX 
4l | | XX XXXX XXXXXX 
40 [x [Xxx XXXXX  [XXXXXXX 
39 XX XXX XXXXXX |XXXXXX 
38 XX XXXX |XXXXXXX|XXXXXX 
37 x XXX |xxxxx [xxxxxxx]xxxxxx 
36 X XXX XXXXXX |XXXXXXX|XXXXX 
35 XX XXXX XXXXXX_[XXXX | 
34 X XXX XXXXX_ |XXXXXX_|XXXXX_ |XXXX 
33 XX XXXX — Dooxxx | XXXXX |XXxXX 1xxx 

E HA 
[32 XX XXXXX | XXXXX..|[XXXX.— [XXXX.—. [XXX 
31 XXX XXXXXX | XXXX___|XXX [xxx XX | 
30 x XXXX_[XXXXXXX] xxx XXX XXX X 
2 XXX XXXXX_| XXXXXX_| XXX XX XX 
28 XXXX XXXXX. | XX XX XX 
2 XXXXX | XXXXX [XXXX — [XX x x q 
26 XXXXXX | XXXX.— |XXX X | » 
25 RESON xx- ir | | tin 
2 XXXXXX. | xx X 
5 XXXXX_ |XX X i E 
2 XXXX X L 
21 XXX el 
20 X li L T EM 


NOTE: Crosses (x) represent frequency of scores. The heavily out- 
lined score cell is opposite the norm or “typical” score for the age 
group. Hence the norm for age 6 is a score of 25. 


points; some of them score higher than typical seven-year- 
olds. In the seven-year-old group, some children score below 
the six-year norm. This is not peculiar; it is a common feature 
of age or grade distributions. 


Now we are ready to use our highly simplified table of age 
norms. 


1. Suppose that June is eight years old. Her test score is 25. 
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Her vocabulary age is six years. This is interpreted as meaning 
that on this vocabulary test June's score is at the level typically 
attained by the six-year-olds in our standardization sample. 

2. John's chronological age is nine years. His score is 37, 
which is the score typical of eleven-year-olds in our sample. 
His vocabulary age is eleven years on this particular test. 

In this way age norms enable the teacher to compare a 
child's score with the level of attainment of specified age 
groups. In addition, this comparison is based upon specific test 
materials. The test we have described is based on a sampling 
of vocabulary items, and the scores are related to the attain- 
ment of groups of children who comprised the standardization 
sample. The interpretation and use of age norms must be 
based upon a recognition of these factors in addition to those 
that are common to all tests, such as validity and reliability. 

The teacher will find a variety of tables of age norms ac- 
companying the group tests which are widely employed in 
classrooms. Among these are mental (MA), education (EA), 
and reading (RA) age norms. It is possible to derive arith- 
metic, vocabulary, and other age norms when appropriate test 
materials are employed. 


The mental age, or MA is derived from scores on tests 


which purport to measure intelligence, mental maturity, gen- 
eral mental ability, and so on. The educational age, or EA 
is derived from sets of norms for varying age groups which 
have been based on test results in the area of educational 
achievement or scholarship. Reading age norms are developed 


on the basis of tests of reading skill or attainment. , 
In all the above instances, the norms must have reference 


to (1) definite areas of ability and achievement, (2) the spe- 
Cific items used to sample these areas of ability or achieve- 
ment, and (3) the group or groups of pupils who formed the 
Standardization sample which purports to be representative 
of the pupil population at the specified age levels. 
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Age norms help the teacher to think about children as they 
are and as they compare with others. One of our serious edu- 
cational tasks is to locate the pupil in the academic milieu. 
Where shall we start with Mary? How well can Betty read as 
compared with other pupils? What materials are best suited 
to Anne's needs and capacities? What level of achievement 
should be expected of Roger? Age norms are not the answers 
to problems; rather they serve as markers or guides which 
are of value in so far as their meanings and implications are 
clearly understood. 

Perhaps the type of norm most widely used is the grade 
norm, particularly at the elementary school level. Grade 
norms are developed in much the same way as age norms, but 
the reference point for grade norms is, of course, grade level 
rather than chronological age. The procedure might be sum- 
marized as follows: 

1. The test (for example, an achievement test) is admin- 
istered to specified grade groups, say, grades four through 
eight. 

2. Typical scores (medians or averages) are worked out 
for each grade level. Perhaps additional central scores are 
found for each half term at each grade level. 

3. These typical, or central, scores become the norms for 
the grades, and intermediary points representing number of 
months in the grade are developed. Hence the teacher finds 
that a test score of, say, 68 represents a grade placement of 
4.6, which would indicate that this score represents the typi- 
cal performance of pupils who have been in the fourth grade 
for a period of six months. Grade norms are ordinarily based 
upon the supposition of a ten-month school year. A study of 
the norms tables for any achievement test which gives grade 
norms will indicate whether the base is a ten- or twelve- 
month year. 

To read the table of norms, the teacher finds the pupil's 
raw score. Mary, for example, in the sixth grade scores 120 
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on a general-achievement test. A score of 120 is equivalent 
to a grade norm of, say, 8.3. Mary's score is typical of pupils 
who have been in the eighth grade for a period of three 
months. This decimal part of the figure (three months) is 
very probably an estimate or approximation, as norms are 
not usually worked out for each month of the school year on 
the basis of actual sampling; such à task would be endless. 
However, this norm—grade placement 8.3—helps the teacher 
to understand his problem with Mary in à sixth-grade class. 
Her scholastic attainment is obviously quite superior by com- 
parison with the reference group (the pupils used as a basis 
for the standardization of the test), and she is likely to find 
the usual sixth-grade materials quite easy and perhaps boring. 
Work designed to challenge Mary would be about at the 
level suited to beginning eighth graders, or, at any rate, some- 
what richer and more varied than that which is suitable for 
typical sixth graders. Thus the norm helps us to locate Mary 
on the academic ladder and provides a reference point in 
planning for her. 

Perhaps grade norms are popular with teachers simply be- 
cause of their familiarity with the grade system. The ele- 
mentary school teacher is likely to be quite familiar with 
third, fourth, or fifth graders and with the quality and level of 
achievement of pupils in these grades, just as junior and 
senior high school teachers are familiar with the level of 
achievement of their grades. The system of grades is deeply 
ingrained in our educational thinking. Hence. the location of 
à child in terms of a grade norm gives the teacher a basis for 
Planning regardless of the actual grade placement of the 


pupil. 


THE MEANING OF PERCENTILE RANKS 


perhaps one of the 


is peers is 
o a test score. Age 


Comparing a child with h 
Most useful ways of giving meaning t 
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norms, as we have seen, locate the pupil in terms of age 
groups but not necessarily with pupils of his own age or 
grade. Comparisons with others of the same age or grade are 
most commonly made in terms of the percentile rank of the 
score a pupil attains on a given test. Percentile ranks indicate 
the relative standing of the pupil in a defined age or grade 
group. 

If, for example, Jean's score on a particular test places her 
at percentile 20 for fifth-grade pupils, this means that her 
score is better than those of 20 per cent of the fifth graders 
who comprised the standardization sample. However, 80 per 
cent of this group of fifth graders made higher scores than 
Jean. The norm indicates Jean's standing among fifth graders, 
and it must be understood that the 20 does not refer to 20 
per cent knowledge of the test area or imply that Jean an- 
swered 20 per cent of the questions correctly. 

The example above indicates further that the percentile 
norm is a separation point in a distribution. That is, a per- 
centile rank of 85 is a point which separates the upper 15 per 
cent from the lower 85 per cent of the group. Hence, this 
type of norm gives us the relative rank of pupils, an indica- 
tion of their relative standing in a hypothetical group of one 
hundred. 

Among the difficulties encountered in utilizing ranking 
measures such as percentile norms is the fact that the ranks 
do not represent equal units of measurement. The foot rule, 
for instance, is divided into twelve inches, each identical 
with every other. Suppose that the inches on a ruler were not 
of uniform length—that the "inches" at each extreme are 
excessively long and those near the center excessively short, 
as in the diagram in Figure 2. On such a rule, the inches have 
different meanings as units of measurement, depending upon 
their location on the rule. 


An analogous situation prevails in the use of percentile 
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ranks as units of measurement. These flexible units are 
shorter” near the center of any distribution in the sense that 
they imply smaller differences in actual score points near the 
center of the distribution; they are “longer” near either ex- 
treme in the sense that score differences between percentiles 
are greater at the extremes. That is, in terms of actual points 


sc ; : $ 
ored on a given test, a difference of five percentile ranks at 
5—may mean a 


the upper extreme—say, percentile 90 to 9 
t. On the other 


difference of twenty score points on the tes 
hand, a difference of five percentile ranks near the center of 
the distribution—say, percentile 50 to 55—may mean a dif- 
ference of only two or three score points. 


F " 

ed 2. Illustration of unequal units of 

veg a foot rule so divided that t 
Ches" are of unequal proportions. 


measurement. The diagram 
he units of measurement OF 


This actual inequality of differences which appear alike 
When converted to percentile ranks is explained by the fact 
that large numbers of individuals tend to score in the middle 
Tange of a test, and few are found at either extreme. Since 
Percentile ranks are based on proportions of the group in- 
cluded at or below certain points, the ranks jump rapidly 
Wherever large frequencies occur. Conversely, percentile 
Tanks increase slowly where frequencies are small. 

To use another analogy, think of a group of a hundred 
Persons running a race. One is outstanding and defeats the 
Others. He is better than the other ninety-nine, and his per- 
Centile rank is 99. The person who ranks second is quite 
Some distance behind the first, but distance does not affect his 
Tank order. He is better than ninety-eight of the runners; 
hence his percentile rank is 98. The large group of average 
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runners are probably massed at one point on the race track 
and reach the finish line in a group. The one who finishes 
fiftieth is better than fifty of the runners, so his percentile rank 
is 50. He has probably just barely defeated a number of oth- 
ers who are very close behind him, but he ranks 50 and the 


Finish 
Start 


Percentile 
rank 


1 
1 
! 
I 
1 
[ 
I 
D 
D 
1 
1 
1 
| 
1 
1 


i 
Fic. 3. The meaning of percentile ranks. Ten runners are engaged in 


a race. The winner, at A, is better than the other nine runners (90 
per cent); therefore his percentile rank is 90. His percentile rank is 
ten "points" better than the rank of runner number two, at B, who 
attains the percentile rank of 80, since runner number two is better 
than eight of the ten runners, or 80 per cent. Note, however, the long 
distance which separates the two runners at A and B. 

The runners in area C are grouped closely together, but each 
achieves ten percentile-rank "points" above the nearest succeeding 
runner, since each comprises 10 per cent of the total group. Ten per- 
centile "points" at C appear to be the same as ten percentile "points" 
at A but give no information as to the actual distance separating the 
runners. This grouping near the center of a distribution, as at C, is a 


common feature of the score distributions of educational and mental 
tests. 


next runner 49 regardless of the very short distance between 
the two. 


At the slow extreme, perhaps one runner is trailing far be- 
hind. He is last, and his percentile rank is zero. The nearest 
runner in front may be far ahead of him, but that nearest one 
is better than only one runner, the last. His percentile rank is 


one regardless of the distance that separates him from the 
last man. 
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Meo epo this situation for ten runners. It is apparent 
Vr qe x. e rank depends entirely on the number of peo- 
i bibe ane in the distribution. Thus rank is independent 
"fe seriedad value or number of items answered correctly. 
thes Bs dic utilises ou ranks should keep in mind 
im, saneat ese norms represent variable units of meas- 
Salle p ally when he is tempted to make distinctions among 
1 a the basis of percentile-rank differences. 
nae urs L these characteristics, percentile norms are as 
fila m b E teacher as any ranking system, for they indi- 
detis Lt ative standing of the pupil in à defined age or 
lan fbr a and thus help the teacher understand the pupil, 
ws te m and develop reasonable expectations for him. 
cate" the pupil among others of his own age or 
grade. T 


STANDARD SCORES AND MEASUREMENT 
PROBLEMS 


S 
tandard scores represent the attempt to develop equal 


P As we have seen, percentiles are in real- 
Rectan n which have no uniform reference to size of score. 
orbus I on the other hand, are determined by the 
Wir of points the pupil scores father than by his rank 
fis FAV a group. Standard-score units are equal throughout 
ribution. 

e ns e uh inge in educational 
iets "m is n5 such thing as an 
Mad ere is no beginning point, 
eon ement. Where, for example, d 
there such a thing as zero intelligence 
Vei we be established in developing measurement sys- 
s ased on standard scores, à central point is used as à 
nce. This point is the mean, OF average, score. In deriv- 


measurement is the fact 
absolute zero. In other 
as there is with linear 
oes intelligence begin? 
? Since such starting 
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ing standard scores, then, the first step is to locate the aver- 
age score of the group. 

The next step is based on the assumption that score dis- 
tributions will normally follow a rather definite pattern. This 
pattern is called the normal curve and is represented by a 
symmetrical, bell-shaped curve with definite mathematical 
properties, as shown in Figure 4. The high parts of the curve 
represent the highest frequencies of scores, at C, M (the 


A 8 [4 M D E F 


So: -20 lo 0 Ho +20 +30 
Fic. 4. The normal curve. The symbol ø represents standard deviation, 


à measure of the dispersion of scores around the mean (M) or aver- 
age. 


mean), and D. At B and E the shorter vertical lines repre- 
sent fewer scores as we approach either extreme. At the ex- 
treme left and right, the extremely low and high ends of the 
scale, scores are very infrequent; this, as we have seen, is the 
distribution found where large numbers of persons are meas- 
ured with respect to almost any characteristic. For example, 
suppose we consider height of adults. Very few persons are 
extremely short, very many are about average in height, and 
very few again are extremely tall. This, then, is the type of 
distribution we expect to find (or normally find) when ade- 
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ee eae of human characteristics are applied to large 
cn ase 4 random from the population. Standard- 
o os s: ased upon this assumption and may be de- 
ries gei = ors to distributions which approximate the 

paie al curve. 
Sn eta Pe the bell-shaped curve represents the normal 
istribution of test scores, the problem resolves 


e weighting of standard 


low - 
----------- Average or mean 


Fig. 5 " 
ale Diagrammatic representation of th 
in terms of deviation from the mean. 


opis that of making allowances for the high frequency 
tien around the center of the distribution and the low 
Sen ncy ab the extremes. To do this, scores near the ex- 
their Ll given added weight depending upon the extent of 
eviation (in score units) from the average score. The 
ge the deviation of the Score from the center, the greater 
weighting. This process gives à result analogous to that 


Shown in Figure 5. 
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Five pupils receive score B, which is relatively close to M, 
the average score. Since score B does not deviate greatly from 
the average, this score receives little weight, even though it is 
the score attained by five pupils. Score 4 is near the lower ex- 
treme of the total distribution of scores. It is equivalent to 
score B in weight even though only one pupil scores at this 
low level. Score C, at the upper extreme, is also equivalent 
in weight to score B, even though only one pupil attains a 
Score this high. This weighting of scores at the extremes pre- 
vents the difficulties of interpretation that occur in using per- 
centile ranks, since the weighting takes into account not only 
deviation of the score from the average but also to some ex- 
tent frequency of scores. The result is equal units of measure- 
ment along the base line, that is, in terms of raw-score points. 

These units of measurement are based on the standard devi- 
ation, a measure of variability or dispersion of scores around 
the central point in a distribution. Standard deviations under 
the normal curve are presented in Figure 6. The mean or 
average is arbitrarily assigned a value of zero (no deviation 
from the center of the curve), and four standard deviation 
units have been indicated on either side of the mean (note 
that each standard deviation unit represents an equal distance 
along the base line of the curve). Opposite A are indicated 
the proportions of scores or measures which, assuming a nor- 
mal distribution, may be expected to fall within each area of 
the curve marked off by the vertical lines. Approximately 68 
per cent (roughly two-thirds) of the Scores fall in the area 
within one standard deviation above and below the mean or 
average, and roughly one-sixth of the Scores fall below and 
one-sixth above these points. 

Standard scores are based on standard deviations and are 
centered around the average score attained by pupils in the 
standardization sample. A type of standard score, the T score, 


indicating the equal base-line units, is shown opposite B in 
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Figu i 

z - js 6 with a score of 50 representing the mean or average. 

i ase line in this figure illustrates the fact that standard- 
e units are equal in base-line length. The base line here 


represents score points. 


E^ 10 2039 40 

= -3 -2 ^l 0 + +2 +43 44 

Approximate per cent of scores folling below stondard score points 

Fio. 2m 2% 16% Se 84% s8% 

iate ae : andard scores, showing equal base-line units and approxi- 
dn Lr of scores included below stated standard-score points. 

likely to © xh above are shown the proportions of the population 

ample, a receive standard scores within the limits specified. For ex- 
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Fic. 7. Percentile ranks and standard scores. 

a. Percentile (or centile) ranks represent unequal units along the 
base line. Where frequencies are great, percentile ranks “stretch” high 
and narrow. The base line represents points scored on the test. The 
diagram shows the effect where the group is a hundred pupils. 

M is the center of score distribution. B is a score attained by five 
pupils near the center of the distribution. It represents five percentile 
ranks piled up above one score point. A is near the lower extreme. 
Five pupil scores are spread out over this area of five score points. 
Here the base line extent is five times that at B 
percentile ranks. 

b. Standard scores represent equal units along the base line. They ac- 
commodate many or few scores retaining their unit length on this base 
line. The diagram shows the effect where the group is a hundred pupils. 

M is the center of the score distribution. B is a score point near the 
center achieved by five pupils. It has no more weight and occupies 
no more of the base line (score points on the test) than score A, 
which is attained by only one pupil but is at the lower extreme of 
the distribution. These equal units of measurement throughout the 


scale result from the weighting of scores at the extremes to compen- 
sate for lower frequencies at these levels. 


for the equivalent 
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standard score 40. Only 2 per cent of the population exceeds 
standard score 70. The percentages employed here are based 
on the theoretical normal curve. 

The uses and advantages of standard scores may be sum- 
marized as follows: 

1. Standard scores represent equal units of measurement 
and hence facilitate comparisons regardless of the area of the 
distribution of scores under consideration. 

2. Standard scores from different tests are comparable to 
the extent that they may be averaged or combined (if the as- 
sumption of normality seems to be warranted). This applies 
€ven though the tests may contain different numbers of items 
and one test may be more difficult than the other. 

3. Standard scores are based essentially on score points on 
the test rather than on rank order. This facilitates interpreta- 
tions in terms of ability or achievement as represented by test 
score and indicates relative standing in the group at the same 
time. 

4. Mathematically, standard scores have other values. The 
zero point is always the mean (standard score 50 as described 


here). The mean is the most stable measure of central tend- 
ency, that is, the most stable central score. The range Between 
standard scores of 40 and 50 represents one standard deviation 


below the mean (or average). The standard deviation is the 
most reliable measure of variability within the group and 
has many statistical uses. (See Figure 7.) 


NORMS AND THE TEST MANUAL 


Before giving a test, the teacher should read carefully the 
directions for administering it and the descriptions of norms 
Presented in the manual of directions. It is important that the 
teacher follow these directions, since the norms have been 
developed on the basis of these specific regulations and re- 
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quirements and any deviation from the required procedure 
may invalidate them. Pupil scores can be interpreted on the 
basis of the norms provided only when the test has been ad- 
ministered under the standard conditions set forth in the 
manual. The directions for scoring the test must be care- 
fully followed, since deviations from the standard methods of 
scoring also will invalidate the norms. 

The test manual will also include a description of the types 
of norms provided, and the teacher should study these descrip- 
tions carefully in order to interpret the norms correctly. For 
example, we have defined percentile rank as a point of separa- 
tion which marks off one proportion of the group from an- 
other. In other sources the teacher may read that a pupil's 
percentile rank indicates the per cent of the pupils in the 
group that he equals or excels in score on the given test.? In- 
terpretation of pupil scores will vary with the definition of 
norms, and it is important that the teacher study the definition 
given in the manual of directions for the particular test he is 
giving. Only on this basis are accurate interpretations possible. 

In addition, the teacher must examine the test manual to 
discover the nature of the sample population on which the 
norms were based, for they will be of little value for use with 
pupils who differ markedly from the normative groups. For 
example, children of migratory workers and children in iso- 
lated or slum areas are not likely to compare favorably with 
the norms usually presented with educational tests. 


SUMMARY 


The task of persons interested in educational evaluation is 
to establish meaningful methods of measuring pupil status 
and progress relative to worthwhile educational goals. The 


* Arthur E. Traxler, Techniques of Guidance, New York: Harper & 
Brothers, 1945, p. 182. 
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development of norms has facilitated educational evaluation 
to the extent that they give the teacher a basis for comparison 
of the pupil with others of specified ages or grades. The pur- 
pose of norms is to place scores on tests in a framework 
Which helps the teacher to make comparisons or to relate 
Scores to one another. 

Norms are derived from a study of 
by large groups of pupils of specifie 
They are measures which relate the score of one pupil to the 
scores of others. Among the kinds of norms frequently em- 
ployed are age, grade, and percentile norms, and standard, or 
T Scores. 

Age norms relate the individual's score to the score typi- 
cally attained by pupils of a specified age group. Age norms 
have been developed for mental age, educational age, and 
reading age. Mental age is derived from scores on tests of 
mental maturity, mental ability, Or intelligence. Educational 
age relates the individual's score to the attainment of age 
groups on tests of general educational achievement. Reading 
age is based upon relative attainment in skills and under- 


standings relating to reading. 
Grade norms represent the typical attainment of pupils at 


Specified grade levels. Although they may be used with tests 
of aptitude or attainment, they are most commonly used in 
connection with tests of achievement in the basic academic 
fields or as a measure of general educational achievement. 
Grade norms help the teacher to determine the level of 
achievement of the pupil with reference to school grade. 


Percentile norms or ranks indicate the relative standing of 
the pupil in a defined group such as an age OF grade classifica- 
which have no reference 


tion. They are rank-order measures 
to size of score except as this determines rank order in a Series 
of scores. These norms are, however» widely used, and they 
are meaningful and helpful, as they indicate the pupil’s stand- 


the test scores achieved 
d age or grade levels. 
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ing among stated groups with reference to the measured char- 
acteristics. 

Standard scores represent equal units of measurement, have 
a wide range of application, and facilitate accurate interpreta- 
tions. They relate a pupil's score to scores attained by specific 
groups, such as age or grade groups. 

The teacher who plans to utilize the norms presented in 
test manuals should adhere strictly to the directions for ad- 
ministering and scoring the test, as the norms are derived on 
the basis of these standard procedures. He should also study 
the definitions, methods of derivation, and interpretation of 
the norms in order to evaluate and interpret them and to un- 
derstand the implications of certain variations in definition 
and procedure which may apply to the specific test he is using. 


STUDY AND DISCUSSION EXERCISES 


1. Define the term norm. Indicate the essential difference be- 
tween raw scores and norms. 

2. In what ways may the concept of norms be misused by 
teachers in the evaluation of pupils in the classroom situation? 

3. Anne has a mental age of ten years two months according 
to a group test of mental ability. What additional data would be 
necessary to an adequate interpretation of this derived score? Give 
your reasons for considering each item essential to interpretation 
of the test result. 

4. A standardized achievement test provides age, grade, and 
percentile norms as a basis for interpretation of results. Cite the 
advantages and limitations of each type of norm presented. 

5. Mary, a fifth-grade girl, achieves a score of 65 on the 
arithmetic section of a standardized achievement test. This score 
is equivalent to percentile 70. How would you interpret this score 
in terms of the available data? 

6. Explain the difference between standard score and percentile 
norms. What are the particular advantages of each? 

7. Differentiate between norms and standards of achievement. 

8. Johnny has taken three different standardized general- 
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rd Eee norms, differ. On battery A his achievement 
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CHAPTER FIVE 


Estimating Capacity for Learning 


Among the teacher's more urgent problems is that of estimat- 
ing capacity for learning—of discovering what scholastic per- 
formance to expect of pupils. Is Jim learning as rapidly and 
as well as can be expected? Is the classroom work too easy Or 
too difficult for Mary? What kinds of materials are best suited 
to Don's abilities? Is Jane's poor work in school due to lack 
of ability or to some other factor? These are among the every- 
day problems of the classroom teacher. 

It is as important to discover the pupil's capacity for learn- 
ing as to determine what he is learning. It is widely recog- 
nized that pupils differ markedly in ability to succeed in 
schoolwork and that therefore uniform standards of achieve- 
ment for all are unrealistic and undesirable. No teacher ex- 
pects all children to conform to a given standard in height or 
weight; and certainly every teacher recognizes that forced 
feeding, stretching, pulling, pushing, or any other kind of pres- 
sure would be entirely useless in adjusting the pupil's height 
to classroom standards. Although it is sometimes less obvious, 
educational "pressure" techniques are no more likely to equal- 
ize the learning capacity of pupils. 

Recognition of the existence of individual differences im- 
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plies acceptance of these differences as educational facts 
which form the background for the teacher's work. Indeed, a 
major function of testing is to enable teachers to understand 
and work with the differences that inevitably exist, such as 
Johnny's not learning so readily as Billy. The only satisfactory 
educational practice is to exert every possible effort to adapt 
the curriculum to the educational potentialities of individual 
pupils. 

Adapting the curriculum to the learning potentials of in- 
dividual pupils means careful study by the teacher of the ap- 
titudes of his pupils for schoolwork. By utilizing the evidence 
from tests of intelligence—or tests of scholastic or educa- 
tional aptitude, as they might more properly be called—the 
teacher gathers objective data upon which to base judgments 
and expectations. Tests do not give answers; they provide 
data that are valuable to the teacher to the extent that he uses 
them wisely in making judgments. 


WHAT INTELLIGENCE TESTS MEASURE 


The definition of intelligence js subject to à great deal of 
nts of view. Since our 


controversy involving a number of poi ; 
purpose here is to help the teacher use test results effectively, 
we shall limit our discussion tO the kinds of abilities that are 
commonly evaluated by so-called “intelligence tests.” The 
teacher is urged to examine the tests he uses, read the man- 
uals carefully, and decide what specific abilities are tested by 
the material. An intelligence-test score, like any other test 
Score, is significant only for the materials included in the par- 
ticular test, and the teacher's interpretation of the score must 
first of all be related to the kinds of performances required of 


Pupils by the test. oal 
Items commonly included in intelligence tests are designe! 


to sample abilities such as the following: 
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. Memory—immediate or delayed, meaningful or rote. 
. Ability to deal with verbal materials (vocabulary). 


3. Ability to deal with spatial relationships or to orient the 
self in space. 

4. Ability to deal with verbal relationships ( analogies, op- 
posites). 

5. Ability to deal with numerical materials either as sheer 
facility with numbers or as ability to reason numerically 
or quantitatively. 

6. Ability to find the guiding principle involved in tasks 
which may be verbal, numerical, spatial, or pictorial in 
nature. 


7. Ability to perceive essential details, make fine distinc- 
tions, and notice similarities. 


N e 


Tests of mental ability may include material on as few as 
three or four of the aspects of intelligence outlined above or 
on all or almost all of them. One test may deal primarily with 
verbal materials; another may require manipulation of blocks, 
the solution of puzzles, or the interpretation of pictures. For 
example, vocabulary may be tested by items such as the fol- 
lowing, which differ in the emphasis placed upon language 
skills: 

Words: a vocabulary test of the type which requires some 
reading skill. The pupil selects the one word, A, B, C, or D, 
which has the same meaning as the initial word. 


1. big; A fair B windy C soft D large 


Pictures: a test of vocabulary which does not require read- 
ing skill. The teacher pronounces the word dog. The pupil 
finds and indicates the picture which corresponds to the 
spoken word.* 


*L. L. Thurstone and T. G. Thurstone, S.R.A, Primary Mental 
Abilities, Elementary Form AH, Chicago: Science Research Asso- 
ciates, Inc., 1948. 
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Pictures 


(Ree d. ex 


valuated by items which 


s reasoning ability may be € 

iin n the degree of emphasis upon verbal skills. Of the fol- 

whe : the word-grouping item utilizes verbal materials, 

play an in the figure-grouping item language skill does not 
essential part. 

vane grouping: The pupil is require 
oes not belong with the others. 


d to indicate the word 


A red B blue C heavy D green 


Figure grouping: The pupil is asked to indicate the figure 


Which does not belong with the group. 
Figure-grouping 


A B c 


ra U ze 


Items like the following, in Da the pupil selects the ap- 
Propriate analogy, may be the language or nonlanguage 
lype:? 


* Thid. 

? V. A. C. Henmon 2 and J. Nelson, The Henmon-Nelson Tests 
Of Mental Ability, Form B pom 7-12), Boston: Houghton Mifflin 
Company, 1932. 


72 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


63: Land is to peninsula as ocean is to: 1 gulf, 2 lake, 3 cape, 
4 river, 5 island. 

The particular emphasis of the test items will be reflected 
in the results of the tests for each pupil. Jimmy, for instance, 
has little facility with words and will not do very well on a 
test which is heavily loaded with verbal items. On the other 
hand, he may shine on a test which requires manipulation of 
materials and the solution of problems involving concrete ob- 
jects or pictorial situations. But the fact that the items in the 
verbal test do not tap his best abilities does not mean that it 
is useless for Jimmy to take the test. The test will indicate 
Jimmy's weakness in verbal areas, and this is valuable in- 
formation for the teacher, whether it is news or merely a con- 
firmation of opinion. 

The teacher who is interpreting Jimmy's test results, how- 
ever, must keep in mind that a verbal test represents a 
sampling of only a limited number of abilities. The teacher 
will recognize the significance of more inclusive measures. 
The best way to promote Jimmy's development in verbal 
skills and meanings may be through the areas in which he 
demonstrates greatest capacity for learning. When this is the 
case, the teacher should seek a test or a battery of tests which 
gives a more inclusive picture of the pupil's capacities. 

It is particularly important that a test which samples vari- 
ous areas of intelligence rather than probing “general” intel- 
ligence be used for adolescent students. Recent investigations 
have indicated that specific abilities become more sharply dif- 
ferentiated as the individual develops toward maturity. In sta- 
tistical terms, it has been found that intercorrelations between 
such characteristics as memory, verbal abilities, and number 
facility are not so extensive in fifteen-year-olds as in twelve- 
year-olds. “The correlations from these various studies indi- 
cate clearly that different aspects of intelligence are being 
measured and that independence of mental traits, or differen- 
tiation among traits, increases with age through the adoles- 
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cent years.”* Two practical implications may be derived from 
these data. (1) In studying adolescents, tests of general men- 
tal ability will be less informative than tests that provide a 
profile of abilities. (2) As the child grows, teachers find more 
opportunities to capitalize upon and thus promote the devel- 
opment of the specific abilities as these become more clearly 
differentiated. 

f It is well, also, to understand that there are aspects of ad- 
justment and functioning that are not measured by intelli- 
gence tests. For example, the pupil's determination to use 
what intelligence he has to best advantage is not measured. 
Health conditions and drive may have a deleterious or invig- 


orating effect upon intellectual functioning. The paucity or 
s the development and func- 


richness of experience condition 
Experiential background is 


tioning of intellectual capacity. 
not likely to be indicated. Social intelligence—ability to get 
along with others without excessive emotional tension—has 
not yet been isolated as an aspect of intelligence, yet it does 
affect schoolwork and personal adjustment. 


THE MEANING OF MENTAL AGE 


Most intelligence tests utilize the concept of mental age 
(MA) as a basis for the interpretation of results. The child's 
test score is related to the age group of which his score is 
typical. For example, Betty has a raw score of 107 on a men- 
tal test. The table of norms indicates that this score is equiva- 
lent to a mental age of ten years two months. This means that 
Betty's score on this test is the score typically attained by 
Children who are ten years two months of age. Betty herself 
May be eight or twelve years old chronologically. Her score, 
however, is more typical of the age group ten years two 


the Adolescent Period, Fed- 


‘ 
David Segal, Intellectual Abilities in 
ge n 1948, no. 6, p. 10- 


eral Security Agency, Office of Education, Bulleti 
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months than of either eight- or twelve-year-olds. Betty, m 
other words, achieves at the ten-year two-month norm on this 
test and is said to have an MA of ten years two months. 

In certain respects the concept of mental age is of more 
value to the teacher than that of IQ, which is, however, more 
commonly used. Mental age approximates the use of grade 
norms on achievement tests; that is, it relates Betty to a spe- 
cific age group in mental ability. This may help the teacher 
to develop suitable expectations for Betty with reference to 
schoolwork. 

The teacher may be able, on the basis of his knowledge of 
children at various age levels, to decide upon materials which 
will be suitable for Betty, regardless of her actual grade place- 
ment. Suppose that she is twelve years old and has a mental 
age of ten years. This does not imply that she is just like chil- 
dren ten years of age; in interests, experiences, social develop- 
ment, physique, and possibly in many respects mentally, she 


will differ from “typical” ten-year-olds. However, in terms of 
the mental abilities sampled 


resembles the ten-year- 
admittedly rough, but i 


the teacher’s dealings with Betty, and it Suggests some possible 


oblems. 
k in terms of grade groups 
Y this is in part the reason for 


formation gives 


Point for his attempts to 
tentialities. 


the teacher a worthwhile Starting 
adapt the curriculum to Betty’s po 
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Betty's mental grade placement does not necessarily indi- 
cate a fifth-grade program for her, since she may differ in 
many respects from typical ten-year-olds. Although the cur- 
riculum which suits her best will probably involve a level of 
mental functioning comparable to that expected of ten-year- 
olds, differences in experience and level of social develop- 
ment and the fact that she has been in school and has had 
previous contact with a wide variety of curricular materials 
will influence the teacher in choosing the learning materials 
and processes suited to Betty's needs and capacities. This de- 
cision concerning Betty’s program illustrates the principle we 
have already discussed—that test data should be used to sup- 
plement, correct, or confirm the observations made by the 
teacher. The teacher must then look for evidences of interest, 
boredom, nervous tension, Or enjoyment of suggested tasks 
to see how well the materials tentatively being tried are suited 


to Betty's needs. 
The value of the mental 


the following approximate equivale 
Case A: The IQ of pupil A is 100. His chronological age 


upon entering school was five years six months; because of his 
birth date, he was able to enter school at a relatively early age. 
His MA (five years six months) indicates that very possibly 
he will not be ready to begin reading during his first year in 


School. 
Case B: The IQ of pupil 


-age concept is further illustrated by 
nts: 


B is 90. He was delayed in enter- 
ing school for one year because of illness. His chronological 
age is seven years two months; his MA is six years five 
months. Under ordinary circumstances pupil B is likely to be 
ready for reading even though his IQ is lower than that of 
pupil A, since mental age is more closely related to reading 


readiness than is IQ. 
Case C: Pupil C has 
Six years four months; 


an IQ of 126. His chronological age is 
his MA is eight years. Pupil C is likely 
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to have little difficulty with the first-grade program in reading 
and may need more challenging materials than his classmates 
require. 

The interpretations in these three cases above are clarified 
through the use of mental-age rather than IO concepts. How- 
ever, it must be repeated that many factors aside from tested 
mental ability enter into a child's readiness for learning. The 
Possibilities suggested above should be regarded by the 
teacher as tentative hypotheses that may have to be changed 
aS a result of other circumstances in the child's classroom life. 

Again, the teacher must evaluate test materials carefully. 
Mental age is an average based upon a particular sampling of 
mental abilities; it may cut across few or many abilities, and 
it must be interpreted in terms of the test from which it is de- 


rived and the kinds of problems the test requires the pupil to 
solve. 


THE MEANING OF IO 


The term intelligence quotient (IQ) is firmly established in 
the vocabulary of teachers. The IO is the ratio of a pupil's 
mental age to his chronological age and is found as follows: 


MA 
IQ = GA X 100 


In order to compute the IQ of a pupil, the MA is derived 
from an intelligence test and converted into months. This 
term is divided by the chronological age (CA), also 
verted into months. The result of this computation is 
multiplied by 100, thus clearing two decimal places. 

By way of example, let us Suppose that Billy, whose chron- 
ological age is ten years four months, has completed an intel- 
ligence test which shows him to have a mental age of eleven 
years one month. His IO can be calculated as follows: 


CA = 10-4, or 124 months 
MA = 11-1, or 133 months 


con- 
then 
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Using the formula given above, Billy's IO is 


MA . 188 
CA x 100 = 194 x 100, or 107 


It is evident that the IQ is a ratio of level of mental devel- 
opment to chronological age. Hence it indicates the present 
mato of mental development or the relative brightness of the 
individual. Billy, in the example above, is developing men- 
tally at a rate slightly faster than that of the hypothetical aver- 
age child, as indicated by his IO of 107. The child whose 
menta] and chronological ages are exactly equal has an IO of 
100. Actually, the normal rate of development or average 
brightness might possibly be best defined as the IO range be- 
tween 90 and 110. 

Some Cautions. As we have seen, the IO represents rate of 
mental development or relative brightness. The teacher should 
remember, however, that basically it is derived from a test 
Score and is subject to the usual limitations which pertain to 
Such scores. No test is completely reliable; on similar test ma- 
terials a pupil may perform better at one time than at another, 
Since he may feel better or have a better attitude toward the 
test at one time than at another. When he takes the test, he 
May be at his best with reference to some types of materials 
and at his worst with reference to others. Thus the test at best 
Can only sample his abilities; it cannot give complete cover- 
age. All this indicates that the teacher should employ at least 
as much caution in the interpretation of the IQ as in the in- 
terpretation of any other test score. In fact, because of the 
Particular implications which have become attached to the 
concept of IQ, the need for cautious interpretation can 
scarcely be overemphasized. 

Many persons feel that the IO should be regarded as con- 
stant and that therefore one test result, even though it is sev- 
eral years old, should be sufficient evidence of the brightness 
of a pupil. This is fat from the truth. The IQ is only roughly 
Constant; although it does not ordinarily fluctuate widely 
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from one test to another, it nevertheless does vary, and dif- 
ferences of at least ten points on the usual tests can be ex- 
pected. Still larger fluctuations Occur, even when similar emi 
are used. When tests are administered at intervals of severa 
years, marked deviations are not at all uncommon.? In fact, 
the results of intelligence tests administered during the pre- 
School or primary-grade period should probably be given lit- 
tle credence if they are over a year old. Although test results 
for older pupils appear to be more stable it is doubtful 


whether they are really dependable over any lengthy period 
of time.^ 


IOs derived from different 
parable. The fact that two tes 
telligence" tests does not mean th 


intelligence tests are not com- 


tion is not accomplished by a single test Score—it is the result 
of properly correlating many data. 


It is a recognized fact, too, that children grow at different 
rates and that these rates vary within individuals, This is as 
true with mental growth as it is With any other aspect of 
growth. These variations which may be due to a variety of 


“John P. Zubek and P. A. Solberg, Human D 
York: McGraw-Hill Book Company, Inc., 1954, p. 2 

^J. E. Anderson, “The Limitations of Infa 
in the Measurement of Intelligence,” Journal 
379, 1939. K. P. Bradway, “IQ Constancy on 
Binet from the Preschool to the Junior High School Level, 
of Genetic Psychology, 65:197—217, 1944, 


evelopment, New 
82. 
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actors, innate and environmental, result in sufficient varia- 
tion in the IQ that the teacher must exercise care when apply- 


ing interpretations to individual cases. 


Environmental Influences 


It is commonly assumed that intelligence is inherited. Al- 
though there is evidence that tends to support this point of 
view, it has been demonstrated that environmental conditions 
in the home and community, school attendance, and other fac- 
tors influence IQ. Consequently it is perhaps wisest to con- 
Sider that heredity sets limits to individual potentialities, but 
that these potentialities develop in response to environmental 
Stimulation. 

_ In judging a pupil’s ability to learn, 
in mind the fact that his opportunities to develop his abilities, 
at least along scholastic lines, may have been very limited. 
Given more stimulating environmental conditions, his meas- 
ured ability might change considerably. For example, 
Wheeler? has shown that the general test level of a mountain 
community improved over a period of ten years during which 
improvements occurred in the general environment. Wellman 
and Pegram® and Skodak and Skeels? have shown that differ- 
ences in environmental conditions are related to changes in 


the teacher must bear 


tested intelligence. : P z 

The kind of social and economic environment from which 
a child comes influences his tested intelligence markedly.*° 

TL R. W “A Comparative Study of the Intelligence of East 
Tennessee ete Children,” Journal of Educational Psychology, 
33:321-33 2 ; 

* Beth obs and E: D Pegram, *Binet IQ Changes of Or- 
Phanage Preschool Children: A Reanalysis," Journal of Genetic Psy- 
cholo 939-263, 1944- 

» ig eere H. M. Skeels, *A Follow-up Study of Children in 
Aden Homes," Journal of Genetic Psychology, 66:21—58, 1945. 

? Kenneth Fells A. Davis, R. J. Havinghurst, V. E. Herrick, and 
R. Tyler Intelligence and Cultural Differences, Chicago: University of 


Chicago Press, 1951- 
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Most intelligence tests tend to favor middle-class and upper- 
class children and to discriminate against lower-class chil- 
dren; that is, the test items are concerned with materials more 
familiar to one group of children than the other. Verbal items 
most markedly favor middle-class as against lower-class chil- 
dren, whereas nonverbal and pictorial materials discriminate 
least against lower-class children. This being the case, the 
teacher must consider the child's background in evaluating 
test scores. Although test results may indicate the child's abil- 
ity to deal with a standard school curriculum, they may not in- 


dicate his potential ability to solve problems in areas related 
to his experience. 


PERCENTILE RANKS 


One of the most meaningful interpretations of mental-test 
results is that which compares the pupil with his own age 
group. Mental-age, as we have seen, relates the child to an 
age group with ability similar to his own with regard to the 
sampling of mental tasks included in a specific test. The IO 
purports to be a more general measure Which indicates rela- 
tive brightness. This kind of comparison has its limitations, 
because an IO of 115 at age eight is indicative of quite a dif- 
ferent level of mental functioning than an IO of 115 at the 
age of twelve or fourteen. Although the quotients are similar, 
the expectations of the teacher must be related to the age 
group to which the pupil belongs; therefore, a measure which 
compares the child with those of his age is more meaningful. 

Comparisons with peer groups are most commonly given in 
the form of percentile ranks. Percentile norms indicate the 
relative standing of the child in defined age or grade groups; 
thus they provide a comparison of the child with others in a 
defined group. In making use of percentile norms the teacher 
should keep in mind that they represent variable units of 
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measurement, especially when he is tempted to make fine dis- 
ünctions in terms of others of like age. 

Norms based on standard scores (see Chapter 4) provide a 
further means of comparing the individual with members of a 
Specified group. Norms of this type represent points along a 
Scale, the units of measurement being equivalent throughout 
the length of the scale. Certain IQ measures are based on 
Standard scores. They provide a means of comparing the in- 
dividual's ability with specified age or educational groups. 
Since the basic unit, one standard deviation, is similar in 
Meaning for different tests, norms based on standard scores 
Provide a precise basis for comparison between tests and be- 
tween individuals. 


USING INTELLIGENCE-TEST RESULTS 


Intelligence tests, as we have seen, are valuable instru- 
ments, but their value is dependent upon competent use and 
Interpretation. They are tools which may be used or misused. 
When carefully selected, administered, and interpreted, how- 
Ever, they provide the teacher with significant data. 

The fundamental aim of the teacher is to assist each pupil 
to make the best use of his capacities, and the intelligence test 
is Perhaps the best general measure of learning capacity with 
teference to schoolwork. This is especially true in the more 
academic areas such as reading. composition, arithmetic, and 
Spelling. Mental-age scores permit the teacher to estimate the 

likely to work most effectively, 


menta] level at which a child is c 
Q and percentile rank, the teacher 


and through the use of the I T A 
tations regarding the pupil's capac- 


IS able to formulate expe? á 
ity to learn in academic areas. In the case of a new pupil or a 
tion may be vital to the success of the 


new class, such informa 


Program of instruction. . 
Intelligence-test results are useful, too, in helping the 
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teacher to diagnose learning disabilities. Is J ohnny capable of 
doing better work in reading or arithmetic? Is Mary "just 
plain lazy," or is the work too difficult or too easy for her? 
Is Ronny rebellious because he is unable to work at the level 
expected of him, or is there some other reason for his atti- 
tude? Educational problems are seldom simple. The intel- 
ligence-test result is ordinarily, therefore, only one of several 
types of evidence that should be examined in reaching a deci- 
sion about a learning problem. However, it is important that 
capacity for learning, as indicated by test results, be given due 
consideration. 

In making more specific diagnoses, the teacher may wish to 
examine the pupil's ability Profile. Today, many tests make 
such profiles available. Such instruments may provide esti- 
mates of ability in areas such as nu 
reasoning, 


course, only a rough index of ability to do sc 


à hoolwork; many 
other factors enter into and influence i 


times he may group 
ts or projects which 
levels and kinds of 
Which the individual 
er his general-ability 
as his interests and 


them on some other basis for work on uni 
offer a variety of tasks requiring varying 
mental ability. In deciding upon the part 
pupil is to play, the teacher must consid 
level and his special aptitudes as well 
needs. 

The teacher can derive from intelligence-test results an esti- 
mate of the range of abilities represented by his class. In a 
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C 
pibe n the range is small, the teachers program may 
idit dens oa from that required for a class in which the 
cn eade as a wide range of general ability. In other in- 
eris te aot eacher may find that the general ability of his 
eiaa her low or rather high by comparison with norms. 
bythe res s his expectations for the group will be influenced 
bun à: ults of the tests. Clothing which is too small or too 
"iiy. os usually uncomfortable, and teacher expectations 
16 te ud fit” the group and the individual are unlikely 
pupil, ortable and satisfying for either the teacher or the 


SUMMARY 


" m the more urgent and vital problems of the teacher 
"is of developing suitable expectations for his pupils, both 
ie paige and as individuals, in order to adapt the curricu- 
rd o the educational potentialities of individual pupils. 
he "S mental maturity, intelligence, Ur scholastic aptitudes, 
lbs 4 y are variously called, offer objective evidence that helps 
eacher to make sound judgments about his pupils. 

" Intelligence tests, to use the customary label, may give a 
Ingle measure, called general intelligence, or may provide a 
Profile of a series of abilities. In either case the teacher’s in- 
terpretation of the test results must be based upon a knowl- 
edge of the kinds of materials included in the test and upon 
the kinds of mental abilities which these materials sample. 

In order to give meaning to test scores, results are con- 
Verted into mental ages: jntelligence quotients, percentile 
Tanks, or standard scores. Each of these derived measures has 
its advantages and its limitations. The mental age relates the 
Child to an age grouP in mental ability. The intelligence quo- 
tient is a measure of relative rate of mental development, or 
Of relative brightness: and represents a comparison with the 
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population on which the test was standardized. Percentile 
ranks make possible a comparison of the child with others in 
defined age groups, ordinarily his own age group. Difficulty in 
the interpretation of percentile ranks derives from the fact 
that they represent unequal units of measurement. This dif- 
ficulty is overcome when standard scores are used, and the 
advantage of indicating status with reference to a defined age 
group is preserved. In using any of these mental-test “norms,” 
the teacher must interpret the results in terms of the test ma- 
terials, the sampling of mental abilities represented, and the 
specific conditions of testing. 

Intelligence-test results enable the teacher to help his pupils 
make the best use of their capacities. In adapting the school 
program to individual differences, mental-test results can be 
used to promote sound diagnosis of general or specific learning 
problems, effective grouping of pupils for various purposes in 
the classroom, and judicious gearing of expectations to the 
abilities of the group and the individual. However, tests are 
tools which may be used wisely or carelessly and ineffectively. 
for the value of mental-test results in classroom use is deter- 
mined by the wisdom of the teacher's interpretations, judg- 
ments, and applications. 


STUDY AND DISCUSSION EXERCISES 


1. From your reading and from a study of group mental tests, 
describe the bases upon which you would select a test for use with 
your classroom group at a specified grade level. 

2. Explain: "The results of a test of intelligence which empha- 
sizes verbal abilities may not be completely valid for all pupils 
in a classroom group.” 

3. In some schools, ability grouping on the basis of mental-test 
results is used as a means of facilitating instruction, Discuss the 
values and limitations of mental tests as a means of grouping 
pupils for instruction. 
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4. Th i d 
suis is "emm. between IQ and school achievement is 
opm not strong. List a number of reasons for this apparent 
5 E aa between ability and achievement. 
- Evaluate the concepts of MA and IO as partial bases for: 


4. decidi . 

c on the optimum grade placement of a new pupil 
be est e elementary school level. 
M i-i the probable readiness for beginning reading. 

oe an individual's chances of success in a college- 
preparatory program in high school. 
school, insists that records 
Jative records include the 
of testing. Miss B. objects 


6. M 
à m S., principal of an elementary 
iame igence-test results in the cumu 
"n Sis form of the test and the date 
e REL maintaining that the MA or IO is all that is 
te hich point of view would you support? What are your 

Ji 
NOS qon a test of general ment 
ne in the test and describe t 

the test results are based. 


al ability. Analyze the materials 
he aspects of intelligence upon 
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Chapter 6 contains a discussion and description of individual 
tests of ability. Chapter 8 is concerned with the description of 
group tests of ability. 
Micheels, William J., and M. Ray Karnes: Measuring Educational 
Achievement, New York: McGraw-Hill Book Company, Inc. 
1950. 
Chapter 2 includes a discussion of tests of scholastic aptitude. 
Various types of items are presented. 
Monroe, Walter S. (ed.): Encyclopedia of Educational Research, 
New York: The Macmillan Company, 1950. 
Pages 608—610 contain a listing and brief discussion of mental 
tests. 
Thomas, R. M.: Judging Student Progress, New York: Longmans, 
Green & Co., Inc., 1954. 
Chapter 5 is concerned with evaluation of mental ability. Illus- 
trative materials are presented. 
Torgerson, T. L., and G. S. Adams: Measurement and Evalua- 
tion, New York: The Dryden Press, Inc., 1954. 
Chapter 4 presents a discussion of mental measurement as à 
part of the study of the individual. 
Thurstone, L. L.: Primary Mental Abilities, Psychometric 
Monograph, no. 1, Chicago: University of Chicago Press, 1938. 
Evidence that intelligence consists of a number of abilities iS 
presented. 


CHAPTER SIX 


Evaluating Pupil Achievement 


The assessment of achievement has long been considered a 
Primary responsibility of the school and the classroom teacher. 
Traditionally assessment has been concerned with the pupil’s 
Accumulation of knowledges and skills in such areas as read- 
Ng, arithmetic, language, geography: and history. However, 
educational goals and values are steadily changing, and con- 
cepts of measurement and evaluation are being revised. Al- 
though teachers are still concerned with the scholastic achieve- 
vents of pupils, evaluation of pupil status and progress 18 
Influenced by changes in our concepts of the larger goals of 


education. It is being increasingly recognized that academic 
he attainment of educationally 


ends in themselves. This shift 
jn the nature of measuring in- 
t, and in the utiliza- 


Achievements are means to t 
Worthwhile goals rather than 
emphasis implies revisions 
Struments, in the purposes of measuremen 
tion of the results of measurement. 


THE NATURE OF ACHIEVEMENT 
g must be distinguished from achieve- 


Aptitude for learnin 
P did « accomplished. However, in order 


Ment, or the actual learning 
87 
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to measure aptitude for learning, which is commonly identi- 
fied with intelligence, the designer of intelligence tests 
must use an indirect approach which involves learning OT 
achievement. Intelligence cannot be measured directly but 
must be inferred from its products or its application to various 
types of materials. Hence, in order to accomplish his purpose; 
the maker of intelligence tests attempts to discover what the 
individual has learned in situations experienced by a vast 
majority of persons. Thus he presumes that all persons have 
had opportunities to learn in these areas and that, in the 
majority of instances, differences in test scores reflect differ- 
ences in aptitude rather than in opportunity for learning. An- 
other possibility open to the test-maker is to develop situations 
that are so completely novel that very few persons are likely 
to have had prior experiences in the area being tested. Hence; 
menta] tests assess the individual's capacity or aptitude only 
by inference from his achievements with respect to very com- 
monplace or very novel materials and situations. In other 
words, the mental test is ordinaril 
eral type of achievement. 
Customarily, 


y a measure of a very gen- 


the scholastic-achievement test differs from 
the general-aptitude test in that results of scholastic tests are 
considered to be dependent upon the acquisition of specialized 
skills and knowledges, usually as a result of special training. 
For example, the individual is provided with opportunities tO 
learn in such fields as reading, writing, music, spelling, and 
arithmetic, and his achievement-test results reflect his attain- 
ments in these areas. Basic to the consideration of achieve- 
ment are the ideas of opportunity to learn and attainment of 
skill or knowledge as a result of learning. 

In an intermediate position, between and overlapping the 
concepts upon which aptitude and achievement tests are 
based, are the so-called readiness tests, which measure the 
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knowledges re- 
ng experiences. 
1 and specific 
tests 
s of 
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orn of the specific abilities, skills, and. 
eie success in a specific course of learnt 
Mite : tests, therefore, may tap both genera 
oi ili m areas relating to their special purposes. The 
feeen g readiness widely used in the primary division: 
i entary schools are examples. 
Achievement involves the interaction of the three factors 
we have been discussing: aptitude for learning, readiness Or 
JE mte for learning, and opportunity tor learning. Other 
d factors also enter into the achievement concept: 
dh a these are motivation, health and physical fitness, spe- 
ptitudes or disabilities, and emotional characteristics. 


MEASURING ACHIEVEMENT 


Teachers have many opportunities to develop achievement 
tests which suit the specific purposes of their classroom situa- 
tions. We shall concern ourselves here, however, with stand- 
ardized tests of achievement. The special values of teacher- 
Made tests and the methods of construction of such instru- 
ments are discussed in Chapter 12. 

As we have seen, evaluation is a necessary responsibility 
9f the teacher, and measurement contributes to the process of 
evaluation. The measurement of scholastic achievement pro- 
Vides the teacher with basic data which will be useful to him 
in over-all planning, placement of pupils, pupil guidance, 
evaluation of the effectiveness of instruction, development of 
learning situations suited to the needs and capacities of in- 
dividual pupils, evaluation of special needs or disabilities of 
Pupils, discovery of areas of particular strength, and develop- 
Ment of remedial programs: However, achievement tests do not 


automatically solve educational problems, nor do they neces- 
sarily further the attainment of worthwhile educational ob- 
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jectives. It must be reemphasized that measuring instruments 
are tools whose value depends upon the skill with which they 
are utilized. 

Among the most widely used instruments of evaluation are 
achievement-test batteries designed to provide a general sur- 
vey of skill or knowledge attained in specific academic areas. 
Commonly included in these batteries are tests of reading, 
arithmetic or mathematics, spelling, social science, physical 
Science, and language or literature. 

The basic consideration in the selection of a battery of 
achievement tests is the extent to which the results of testing 
Will reflect status or progress relative to worthwhile educa- 
tional objectives. The teacher should be familiar with the 
achievement batteries available and should study carefully the 
general aspects of the educational program implied by the 
various tests. 

The well-known test batteries are of two principal types: 
those that emphasize acquisition of fundamental skills, as, for 
example, in reading and computation, and those that emphasize 
acquisition of factual knowledge in content areas. A number 
of widely used achievement-test batteries include tests of both 
skills and knowledge. A test battery emphasizing development 
of basic skills at the elementary school level might include 
tests of. work-study skills, reading, language arts, arithmetic 
skills, and spelling. The test items are oriented toward the 
measurement of skills and processes fundamental in the area; 
in arithmetic, for example, items on number concepts, prob- 
lem solving, and the fundamental processes of addition, sub- 
traction, multiplication, and division might be included. 

Achievement batteries designed to measure knowledge of 
content may include tests in the areas of literature, social 
studies, science, and so on. The test items will be based on 
subject matter suited to pupils at the grade level for which 
the test is designed. At the seventh-grade level, for instance. 
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items in social studies might concern the voyages of Colum- 
bus, American history, and the geography of North America 
and possibly of other continents. A “content” item asks for 
information such as: “When did Columbus sail for America?” 
“Where did he land?” Skill items in the area of social studies 
might involve map reading or interpretation of information 
given in the test. 

Some achievement-test batteries provide extensive surveys 
of educational achievement, and others deal more intensively 
with a limited area. The extensive or survey type of test bat- 
tery typically includes a wide variety of measures, but each 
measure is likely to be based on a limited number of items. 
The practical considerations of economy of administration time 
and the cost of the test are essential considerations in the de- 
velopment of a test battery of this type. A survey battery might 
include tests in all or most of the following basic areas of 
learning: reading, arithmetic, spelling, language. study skills, 
Social studies, science, and literature. Such extensive sampling 
of educational fields is likely to preclude intensive treatment 
in any one area. However, the survey battery provides valua- 
ble information in a relatively economical fashion. 


Other tests, as we have seen, sample intensively a limited 


curricular area. For example, if the teacher wishes to make 


an intensive study of an area such as reading skills, arithmetic, 
Social studies, science, literature, OT language, tests of more 
limited range provide opportunities for intensive analysis. 
The teacher will wish to examine the test items in order to 
determine whether the various aspects of achievement are 
adequately sampled. He will also wish to decide whether the 
Items require (1) use of acquired skills, (2) use of acquired 
facts or information, (3) information OF skills suited to the 
Curriculum his pupils have been following. (4) responses 
Which will provide data concerning pupil progress toward the 
Objectives which he has envisaged for his classroom group. 
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In many instances the classroom teacher does not have the 
opportunity to select the tests he would prefer; frequently he 
is provided with test batteries selected by someone else. This 
situation may not be so hopeless as it seems. Study of the tests 
may reveal specific areas or item groups well suited to pro- 
vide basic evaluative data for his special purposes. Other 
aspects of the test may suggest important emphases which the 
teacher has overlooked. 

It is worthwhile to observe the thought processes required 
of the pupil taking the test. In making such an analysis the 
teacher should inquire whether the items require (1) simple 
memorization of facts, (2) direct utilization of simple skills, 
(3) skill in obtaining factual information (as in the case of 
reference skills), (4) application of facts to the solution of 
problems, (5) ability to draw conclusions or to make infer- 
ences, (6) ability to develop generalizations, (7) ability to 
apply general conclusions or principles to specific situations, 
(8) ability to see relationships within given sets of data. 
Although these possibilities may not ordinarily be considered 
to reflect curricula in the usual sense of the term, they do rep- 
Iesent methods of evaluating achievement relative to signif- 
icant educational goals such as those concerned with the de- 
velopment of skill in the use of higher mental processes. 
Through even a cursory analysis of this type, the teacher 
comes closer to an understanding of the test and the philos- 
ophy which it represents. 

The teacher should look upon the test as a tool which, al- 
though it may not be ideally suited to his task, may yet be 
far more advantageous than no tool at all. By way of analogy. 
one does not say that a tool has no value at all because it is 
merely one of many required to construct a house. The 
teacher would do well to consider the achievement test with 
the same critical eye with which the carpenter or mechanic 
surveys his tools, Which tool among those available is best 
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suit á 

S to the task at hand? In the absence of the ideal tool, 
at possibilities are there for adapting those at hand? 


Using Achievement Norms 


Ps carson wg achievement test differs from the informal 
soak GF y ina number of important respects. During the 
Mi xe E andardizanon; the test items are subjected to 
setine perimentation and examination to determine their 
n a units of measurement. Uniform procedures for 
Shu ni and scoring are developed. Relative values or 
Shitting, Th s ablished as a basis for interpretation of the test 
sbscial hia procedures give the standardized test certain 
Mobles es unobtainable in Lo formal tests. However, ‘the 
tion ee nvolved in standardization and large-scale publica- 
both Ra limitations upon the values of such tests. Thus 
Ere on ardized tests and the more flexible, informal tests 

eir unique purposes in the evaluation program of the 


Classroom teacher. 
"o ie Norms. The type of norm 
ine: cad tests is perhaps the gra 
Wines e grade norm, like other typ! à 
ena ig which the teacher uses as à basis for the in- 
hos ion of achievement-test scores. It represents neither 
Sa es of achievement to be expected of an individual child 
d [i of achievement for children at a specified grade 
Ser he norm is merely the score which is typical of the 
leve ment of a sampling of children at à particular grade 
M eric to derive the grade norm, 
the € accumulates total raw scores for the several areas of 
atith st unter the appropriate headings, for example, reading, 
i metic, and language. The teacher then refers to the set of 
vium provided with the test and records the grade norm 
ich each score represents. For the test in Figure 8, the 


most widely used with 
de equivalent or grade 
es of norms, is à ref- 


the teacher scores the 
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sum of scores in the three areas—that is, the score over the 
entire test—could also be converted to the appropriate grade 
norm. Figure 8 represents the achievement of a fifth-grade 
pupil on our hypothetical achievement-test battery. The first 
column lists the three subtests and a space for the total score 
for the battery. The second column lists the raw or uncon- 
verted scores, the total number of items answered correctly 
in each subsection. John scored 72 points in reading, 58 
points in arithmetic, 69 points in language, and, by summa- 
tion, 199 points over the entire battery of tests. In the third 


THE ACHIEVEMENT TEST, FORM A FOR GRADES 4—7 


PuPIL: John AGE: 10.6 Dare: 2/3/56 
SCHOOL: GRADE: 5.6 TEACHER: 

Test Raw score Grade norm Percentile norm 
Reading 72 6.3 75 
Arithmetic 58 4.0 20 
Language 69 5.5 50 
Total 199 5.4 47 


Fic. 8. A hypothetical test situation. 


column are the grade norms appropriate to John’s raw scores, 
derived from the table of grade norms which accompanied the 
test. 

In order to interpret these norms, which are representative 
of John’s achievement, we must ask, “What does the figure 
6.3 represent with reference to the reading test?” The refer- 
ence point 6.3 typifies the performance of pupils who have 
spent three months in the sixth grade. Although at the time 
of testing, John has been in the fifth grade for a period of six 
months, his achievement on this test of reading is at a much 
higher level than the typical performance of pupils of his 
grade; it is more typical of sixth than of fifth graders. On 
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Vg : i PEN type of reasoning, we must conclude that 
em tmo S E some difficulty with arithmetic, or, at 
TER ned dmn i 1e ba of material represented in our achieve- 
that his i a : norm in arithmetic is 4.0, which indicates 
olf festina: f n the arithmetic section of the test is typical 
ttal eli ourth graders. His scores in language and his 
lisi hassle re fairly typical of pupils iw his grade level, but 
eto wn of the three test areas indicates a field of ac- 
sell mA reading and an area of some difficulty in arith- 
actiiemes kj apaia is based upon comparison of John's 
indios bin the test with the typical attainments of pupils 
win 2 : : levels. Thus grade norms provide the teacher 
typical of : 9 comparing the scores of his pupils with scores 
e pupils at specified grade levels. 
bush M Norms. As we have seen, grade norms make it 
— o compare scores with the performance typical of 
lind he iia: grade ‘lee Percentile norms, m the other 
fied ds icate the relative standing ofa pupil within a speci- 
Paii ban grade group. Norms of this type help the teacher 
eebo question, How wall is this pupil achieving by 
“a en with others of his age or grade? This compari- 
group wi x the standing of the pupil with gc to the 
Tt has hich was used as a basis for establishing the norms. 
no reference to his status within his own classroom 


group. 
B boo Je , ; 
efore assigning the percentile norms for à particular set 
ult the test manual and 


oft 
wp results, the teacher should cons 
y the definitions," descriptions, and directions concerning 


1 
Fi 
Le a general discussion of per 


PR Tohn C. Flanagan, "Units, Scores, 
cil o (ed.), Educational Measurement, Was 
n Education, 1951, pp. 717-719: for a discus 


Of definit; d 
finition with reference to percentiles. 


centile norms, See Chap. 4, PP- 


and Norms,” in E. F. Lind- 
hington: American Coun- 
sion of the problem 
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norms. In general, the teacher will employ the following pro- 
cedures: 

1. Sum the scores to derive the various totals for which 
percentile norms are provided in the table of norms. In the 
example of John in Figure 8, these might be the norms for 
reading, arithmetic, and language and the total achievement- 
test score. 

2. From the tables of percentile norms for the age or grade 
level in question, find the norms appropriate to the raw scores 
attained by each pupil. 

3. Utilizing the definition provided in the manual, interpret 
the percentile norms. For example, the manual of the test 
John took provides the following customary definition: “A 
percentile norm indicates the per cent of scores of pupils at 
the specified grade levels which fall below that point.” John’s 
Score in reading, for example, is equivalent to percentile 75. 
In terms of our definition, his score is higher than that of 75 
per cent of pupils at his grade level. The teacher will remem- 
ber that this norm refers to the group of fifth-grade pupils 
whose scores on this test formed the basis for the develop- 
ment of this particular set of percentile norms. With this 
concept in mind, we can conclude that John’s test score indi- 
cates a relatively superior achievement in the area of reading 
by comparison with fifth graders generally. Similarly, the per- 
centile norm of 20, indicating John’s relative achievement in 
arithmetic, points to an area of some difficulty, although his 
Score is better than that attained by 20 per cent of pupils at 
his grade level. The other scores may be converted and in- 
terpreted in the same way to give us a picture of John’s rela- 
tive achievement status. 

Age Norms. Publishers of standardized tests of educational 
achievement commonly provide age norms, or age equivalents, 
in addition to grade and percentile norms as a basis for the 
interpretation of test scores. The age norm is interpreted in 
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much the same manner as the grade norm, but its reference 
point is the typical performance of the age group rather than 
of the grade group. The age norm derived from general- 
achievement tests is frequently termed educational age, Or 
EA. Age norms related to specific subject areas may be pre- 
sented as reading age, arithmetic age, and so on. 

Age equivalents are helpful when the teacher wishes to 
compare the reading, arithmetic, language, or educational age 
of a pupil with his chronological age. They also enable the 
teacher to compare the mental and educational ages of pupils 
in his classroom. However, when two sets of results are com- 
pared— mental and achievement-test results, for example— 
the teacher is cautioned to remember that the norms are de- 
rived from two distinctly different tests which ordinarily have 
been standardized on two different groups of pupils. Again, 
the attainment of a pupil in one subject is likely to differ from 
his attainment in another. In addition, there is evidence that 
intelligence is not a unitary function but is comprised of a 
number of abilities which are not necessarily closely related 
to one another? Thus age norms of different types are not 
So directly comparable as they at first appear to be. 

Standard-score Scales. Although standard scores are fre- 
quently presented as a basis for the interpretation of test re- 
sults, the more common practice is to utilize standard-score 
Scales as a basis for the derivation of other types of norms." 
That is, a standard-score scale is developed from the raw 
Scores, and these standard scores are then used to derive age, 
grade, or percentile norms. Such procedures are designed to 
utilize some of the particular advantages of standard scores. 


(See Chapter 4, pp. 57-63.) 


Abilities, Psychometric Mono- 
1938. 
Yonkers, N.Y.: 


*L. L. Thurstone, Primary Mental 
graphs, no. 1, Chicago: University of Chicago Press, 

“See, for example, Metropolitan Achievement Tests, 
World Book Company, 1946, 1947. 
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RELATING ABILITY AND ACHIEVEMENT 


Recognition of the fact that pupils of a given grade do not 
all have the same capacity to achieve in any subject area has 
led to efforts to group pupils in such a way as to reduce the 
range of individual differences and to adapt curricular mate- 
rials and teaching methods to the needs and capacities of in- 
dividual pupils. Evaluative procedures have been designed to 
relate capacity and achievement on the assumption that an 
achievement “expectancy” can be developed for the individual 
pupil or for a group of pupils. 

Investigations indicate that a wide range of achievement 
levels is to be expected in any classroom with reference to 
almost any curricular area.* Other investigations indicate that 
the range of individual differences in achievement is not sig- 
nificantly reduced by the practice of failing slow-learning 
pupils.° 

The effectiveness of grouping according to general ability 
in reducing the range of individual differences is of question- 
able value, since individuals differ so markedly in specific 
abilities that the over-all range of differences is not likely to 
be markedly reduced by such practices.’ The conclusion of 

* Walter W. Cook, “The Functions of Measurement in the Facilita- 
tion of Learning," in E. F. Lindquist (ed.), Educational Measurement, 
Washington: American Council on Education, 1951, pp. 10-15. 

"Walter W. Cook, Grouping and Promotions in the Elementary 
School, Minneapolis: University of Minnesota Press, 1941. See also 
Cook, “The Functions of Measurement in the Facilitation of Learn- 
ing," in Lindquist (ed.), op. cit., pp. 11-13. , 

'Clark L. Hull, "Variability in the Amount of Different Traits 
Possessed by the Individual,” Journal of Educational Psychology, 
18:97-104, 1927. See also A. D. Hollingshead, An Evaluation of the 
Use of Certain Educational and Mental Measurements for the Purpose 
of Classification, Contributions to Education, no. 302, New York: 
Teachers College, Columbia University, 1928; and Marvin Y. Burr, 


A Study of Homogeneous Grouping, Contributions to Education, no. 
457, New York: Teachers College, Columbia University, 1931. 
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investigators as summarized by Cook indicates that the extent 
of variation in capacity and achievement would not be 
markedly altered if pupils were classified by chronological 
age rather than by grade level.* 

In developing measures of expectancy of achievement, it 
has frequently been assumed that capacity to learn, as meas- 
ured by intelligence tests, is closely related to achievement 
status, Summaries of a number of investigations of the actual 
relationships between measured intelligence and achievement 
reveal that although some relationship is evident,’ it is far 
from sufficient to be used as a basis for grouping pupils. 
Furthermore, the degree of relationship between measured 
capacity and achievement varies from one subject-matter area 
to another.” The findings referred to indicate that differences 
between expectancy and achievement are customary. 

In the attempt to answer the question, “Is this pupil work- 
ing at the level of his ability?" the teacher may be tempted 
to relate directly mental-age norms and achievement-age 
norms for the pupils in his classroom. The accomplishment 
quotient (AQ) has been proposed as à single index of this 
relationship. It is derived by dividing the age norm obtained 
from an achievement test (EA) by the age norm obtained 
from a mental test (MA) and multiplying the result by 100. 
The formula for derivation of the AO is: 


EA 
AQ = MA x 100 
If the mental and educational ages of a pupil are identical, 
his AQ is 100. He would then be considered to be achieving 


*Cook, "The Functions of Measurement in the Facilitation of 
Learning" in Lindquist (ed.), op. cit. pp. 10-14. See also Guinn 
McNemar, The Revision of the Stanford-Binet Scale, Boston: Hough- 
ton Mifflin Company, 1942, pp. 26-28. 

°J. M. Stephens, Educational Psychology, New York: Henry Holt 
and Company, Inc., 1951, pp- 228-231. 

? Ibid., pp. 228-229. 
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at the level of his capacity. If the MA of a pupil exceeded 
his EA, his AQ would be less than 100 and he would be 
considered to be achieving at a lower level than could be 
expected. If, on the other hand, the EA of a pupil exceeded 
his MA, his IO would be over 100 and he could be con- 
sidered to be achieving at a higher level than could reason- 
ably be expected. 

For a number of reasons, the use of the AQ in relating 
ability and achievement is not recommended. Among these 
reasons are the following: 

1. It is doubtful whether the relationship between achieve- 
ment in different academic areas is sufficiently high to warrant 
the use of a composite or average index, such as EA or AQ, 
for this type of comparison. 

2. The AQ is unreliable. 

3. EA and MA should be expressed in comparable units 
and in terms of norms from comparable samples if a ratio of 
the two is to be developed. Ordinarily this is not the case. 

4. There is sufficient evidence of variation among aspects 
of intelligence within the individual to make the use of the 
general mental-age index of questionable value in predicting 
achievement in specific subject areas. 

5. Investigations of the existing relationships between in- 
telligence-test results and achievement-test results fail to in- 
dicate the existence of such a direct relationship as that upon 
which the AQ is based. 

6. There is a general tendency for the average AQ of 
pupils of high ability to show underachievement, whereas low- 
ability pupils in general appear to be overachieving according 
to their AOs. 

In comparing measured ability and achievement, the teacher 
should use caution in drawing conclusions from average meas- 
ures such as MA and EA. Obtained differences present cer- 
tain facts with only limited reliability, but these facts may help 
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the teacher frame hypotheses as to the reasons for the dif- 
ference in the case of the individual pupil. By testing these 
hypotheses carefully, the teacher may reach conclusions 
which will give him direction in his work with individuals. 


THE ANALYSIS AND DIAGNOSIS OF 
LEARNING PROBLEMS 


The total score obtained from an achievement-test battery 
gives the teacher some idea of the pupil's accomplishment. No 
analysis of specific areas of achievement is indicated; the re- 
sult is analogous to the remark of the patient who tells his 
doctor, *I don't feel well." Such a statement of fact is of rela- 
tively little value except as an indication of a need for as- 
sistance. The medical practitioner is likely to ask questions 
designed to analyze the situation, such as, *Do you have 
pain?" “Where is the pain located?" “When do you feel this 
way?” “How long have you felt this way?” The answers to 
these questions concerning the patient's condition may result 
in a specific diagnosis of his problems. 

Similarly, subject scores on an achievement-test battery 
may help the teacher locate and diagnose learning problems. 
An analysis of achievement in the following fundamental 
curricular areas is readily available to the teacher, as indicated 
by the title of the subtests of the Metropolitan Achievement 
Test: (1) reading, (2) vocabulary, (3) arithmetic funda- 
mentals, (4) arithmetic problems, (5) language usage, and 
(6) spelling. 

Since norms are commonly provided for each area as well 
as for the entire test, such a test makes it possible for the 
teacher to analyze the relative strengths and weaknesses of 
his class and of individual pupils. Although this type of anal- 


? Metropolitan Achievement Tests, Elementary Battery, Yonkers, 


N.Y.: World Book Company, 1948. 
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ysis is very general in nature, it may be useful in indicating 
areas requiring special emphasis. In this way, study of the 
test results can increase the effectiveness of the educational 
program. 

However, detailed diagnostic procedures must be based 
upon a more specific analysis of test results. An analysis of 
the processes required in such areas as arithmetic, language, 
or reading indicates that numerous skills are involved. Arith- 
metic fundamentals, for example, involve addition, subtrac- 
tion, multiplication, and division. A test in arithmetic funda- 
mentals subdivided according to these four areas will reveal 
strengths or weaknesses which the teacher should take into 
account. 

Exact diagnosis of learning problems may involve even 
more specific analysis of learning difficulties. For instance, the 
process of subtraction in itself involves a number of skills, 
any of which may represent a point of difficulty for the in- 
dividual pupil. The following list indicates some possibilities 
of analysis within the process of subtraction:' simple com- 
binations; borrowing; zeros; subtracting money; subtracting 
numerators; common denominator; whole from mixed num- 
bers; borrowing, mixed numbers; fractions and decimals; 
writing decimals; and denominate numbers. Analysis of test 
results at this level of specificity helps the teacher to determine 
more exactly the particular problems his students are experi- 
encing in the process of subtraction. 

Analysis of the scores from general-achievement batteries, 
therefore, will provide the teacher with information in pro- 
portion to the specificity of the results. However, the number 
of items which involve a specific process must be strictly lim- 
ited in a general-achievement test, and the results for any 
small group of items are likely to be statistically unreliable, 


^E. W. Tiegs and W. W. Clark, California Achievement Tests, 
Intermediate Battery, Los Angeles: California Test Bureau, 1950. 
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although they may be suggestive of possible difficulties. To 
provide opportunity for more comprehensive analysis, stand- 
ardized diagnostic tests or tests especially developed by the 
teacher may be utilized when a general area of difficulty has 
been identified or when intensive analysis of a survey test of 
achievement has indicated a possible source of difficulty. 
When pupils appear to be experiencing difficulty in a specific 
area of arithmetic, as, for example, in multiplication of whole 
numbers, the teacher may administer a test designed for in- 
tensive analysis of this area." Since a test so designed pro- 
vides numerous items related to cach skill required in the 
process under study, the results are more clearly indicative 
of the specific difficulty the pupils are experiencing than are 
the results of more generalized tests. 

Although this discussion has been focused upon the use of 
standardized achievement tests, adequate analysis and diag- 
nosis of educational problems ordinarily involves considera- 
tion of the mental abilities, educational background, health 
and physical status, environment, and emotional status of the 
pupil. Analysis of achievement-test results, however, may play 
an important role in educational diagnosis and the establish- 
ment of effective instructional and remedial procedures. The 
value of the tests as tools in this process will reside in the skill 
with which the test results are utilized by the teacher in form- 
ing hypotheses to guide his work with his pupils. 


SUMMARY 


Perhaps the most widely used instruments of educational 
evaluation are achievement-test batteries, which are designed 
to provide a general survey of attainment of academic skills 
and knowledges. For the teacher, the basic consideration in 
sts and Self-helps in Arithmetic, 


L, J. Brueckner, Diagnostic Te 
1955. 


Los Angeles: California Test Bureau, 
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the selection and utilization of such test batteries is the extent 
to which the results of testing reflect pupil progress relative 
to worthwhile educational goals. 

In selecting and preparing to use a standardized achieve- 
ment test, the teacher should examine the organization of the 
test and the various types of skill knowledge, and thought 
required of the examinee. This study will help the teacher to 
relate the results to the educational goals which he has formu- 
lated for his pupils. The test is in reality a tool which may be 
useful to varying degrees as an instrument providing evalua- 
tive data. Its value is largely dependent upon the skill with 
which it is used. 

The tables of norms which accompany standardized 
achievement tests provide the teacher with a basis for in- 
terpreting the test results. Grade norms or grade equivalents 
indicate roughly the status of the pupil with reference to scores 
typical of various grade groups. Percentile norms make it pos- 
sible for the teacher to compare the pupil with others of the 
same grade or age who comprised the standardization popu- 
lation, Age norms, or age equivalents, provide a basis for 
comparison of pupil performance with that of various age 
groups. Where standard scores are presented, the teacher is 
provided with scales representing equal units of measurement. 
Such scales are of value because they make possible com- 
parisons among test batteries. 

Although various methods of relating general ability and 
achievement have been proposed, such relationships are gen- 
erally hazardous. Although some relationships exist between 
measured ability and achievement, measured ability to learn 
is not a sufficient base for establishing an expectancy of 
achievement. 

The teacher may conduct analyses and attempt to diagnose 
learning problems at varying levels of intensity. Analysis 
helps the teacher to formulate hypotheses as to possible causes 
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of learning difficulties and provides some basis for instruc- 
tional procedures designed to overcome them. The more spe- 
cific and reliable the analysis, the more likely it is that exact 
sources of learning difficulties will be located. However, diag- 
nosis must ordinarily be based upon observations more com- 
prehensive in nature than the results of an achievement-test 
battery. Intensive study of achievement-test results may, how- 
ever, help the teacher derive maximum benefit from these in- 
struments as aids to instructional planning and procedures. 


STUDY AND DISCUSSION EXERCISES 


1. Suggest some possible reasons why measured ability and 
achievement are not so closely related as might be expected. 

2. Study a general-achievement battery and outline the educa- 
tional philosophy it appears to represent. 

3. Select a subdivision of an achievement-test battery and list 
the detailed skills and knowledges which pupils must possess in 
order to succeed in the selected test area. 

4. What do you consider to be the teacher’s responsibilities in 
utilizing standardized tests of achievement in the classroom? 

5. What are some of the purposes which you as a teacher 
might have in mind when planning to administer a general- 
achievement-test battery? 


6. Analyze the specific skills required of the pupil in perform- 


ing any one of the following functions: (a) two-column addition; 
(b) multiplication of two-digit numbers; (c) punctuation, includ- 
ing quotations; (d) locating significant physical features on a 
map of South America; and (e) locating reference materials in an 


encyclopedia. 
SUGGESTED ADDITIONAL READINGS 


Buros, Oscar K. (ed.): The Fourth Mental Measurements Year- 


book, Highland Park, N.J.: The Gryphon Press, 1953. 
Includes a listing and evaluation of currently available tests of 


educational achievement. 
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Freeman, F. S.: Theory and Practice of Psychological Testing, 
New York: Henry Holt and Company, Inc., 1950, chap. 11. 
This general discussion of the measurement of educational 
achievement includes a presentation of samples of the contents 
of representative tests in this area. 
Gates, A. L, A. T. Jersild, T. R. McConnell, and R. C. Chall- 
man: Educational Psychology, 3d ed., New York: The Macmillan 
Company, 1948, pp. 543—552, 561—567. f 
Presents a general account of the appraisal of pupil progress 
through the use of tests and includes suggestions for approaches 
to educational diagnosis. 
Goodenough, F. L.: Mental Testing, New York: Rinehart & Com- 
pany, Inc., 1949, chap. 22. : k 
In connection with this chapter the author presents a consid- 
ered account of the problems of relating ability and accom- 
plishment. « 
Greene, E. B.: Measurements of Human Behavior, rev. ed., New 
York: The Odyssey Press, Inc., 1952, chap. 7. 
This chapter includes a discussion of the measurement of edu- 
cational achievement and presents examples of measurement 
possibilities in a variety of academic areas. 
Jordan, A. M.: Measurement in Education, New York: McGraw- 
Hill Book Company, Inc., 1953, chaps. 5-13. 
These chapters present a detailed account of methods and in- 


struments for the measurement of attainment in a wide variety 
of subject areas. 


CHAPTER SEVEN 


Estimating Readiness for Learning 


Children differ in their readiness to undertake learning tasks. 


To be effective, instruction must take into account individual 
differences in the abilities, skills, and knowledges which are 
Tequisite to successful learning experiences. In certain cur- 
ricular areas, many of these abilities, skills, and knowledges 
have been defined and readiness tests have been devised to 
Measure them. Such tests provide the teacher with data about 
his pupils which are helpful in planning learning experiences. 


READINESS AND LEARNING 


ect of physi- 


The term readiness may be applied to any asp 
ty which is 


cal, mental, emotional, or experiential maturi 
Tequisite to a learning task. A child learns best when he is 
ready." He does not learn, or learns slowly and with difficulty, 
when he lacks the necessary maturity. 

Although the readiness concept has been most commonly 
applied to school beginners, it holds for all grades and age 
levels and for all types of subject matter. For example, it is 
Possible to define requisite abilities, skills, and knowledges 
In an area as specific as two-place division at the fifth-grade 
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level* or in such general areas as foreign languages, advanced 
mathematics, or science. The prognostic or special-aptitude 
tests designed to predict success in certain of these curricular 
areas are closely related to the readiness tests which are 
utilized with school beginners. 

There appear to be optimal mental ages for learning such 
aspects of arithmetic as addition, addition and subtraction of 
like fractions, and long division. A readiness test has been 
found useful in predicting success or failure in arithmetic? 
Mental age has been demonstrated to be related to chances of 
success in beginning reading, and the results of readiness tests 
and rating scales designed to measure reading readiness in- 
dicate likelihood of success or failure in this task. 

These experimental results serve to illustrate the fact that 
a detailed understanding of the requirements of the learning 
task, together with an accurate evaluation of the readiness of 
the learner, may be very valuable to the teacher in curricu- 
lum planning. This does not mean, however, that the teacher 
should stand idly by waiting for the children to become 
"ready" for learning. Such skills as arithmetic, reading, and 


writing develop only as a result of learning. Carefully planned 
instruction can often enhance readiness for lea 


rning in such 
areas." Evaluation of readiness is possible and 


readiness pro- 


* W. A. Brownell, “Arithmetical Readiness as a Practical Classroom 
Concept," Elementary School Journal, 52:15—22, 1951. 

*C. W. Washburne, *Mental Age and the Arithmetic Curriculum," 
Journal of Educational Research, 23:210—231, 1931, See also L. B. 


Ames and F. L. Ilg, "Developmental Trends in Arithmetic," Journal 
of Genetic Psychology, 79:3-28, 1951. 

*L. J. Brueckner, *The Development and Validation of an Arith- 
metic Readiness Test," Journal of Educational Research, 40:496—502, 
1947. 


‘William Kottmeyer, "Readiness for Reading," Elementary English, 
24:355-366, 528-535, 1947. 


"C. M. Scott, “An Evaluation of Training in Readiness Classes,” 
Elementary School Journal, 48:26-32, 1947, 
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grams can be planned in any area of learning in which the 
requisite skills, abilities, and knowledges can be differentiated. 
The manuals of directions for readiness tests frequently offer 
suggestions which may form the basis for such programs. 


READINESS TESTS 


Like other tests, readiness tests may be general or specific 
in nature. In developing a readiness test, the test-maker ana- 
lyzes the learning activities involved in the subject area, at- 
tempting to define the components of the background requisite 
to effective learning. He then designs test scales and items 
which provide estimates of pupil performance in these areas. 
As with other standardized tests, a careful study is made of 
the results of testing, revisions may be made in the original 
test, and final forms of the test are developed. Norms are 
then established and indications of the practical values and 
Possibilities are presented in the manual of directions. 

The Metropolitan Readiness Tests may exemplify a gen- 
eral readiness test. It was designed to measure the traits and 
achievements of school beginners that contribute to their 
readiness for first-grade instruction, and it does not require 
the ability to read. The tests include measures of comprehen- 


including phrases, sentences, and vocab- 


Sion of language, MS 
n of similarities; 


ulary; visual activities involving perceptio: 
tests of number knowledge; and a copying test which provides 
à measure of visual perception and motor control. The au- 
thors believe that the results of the test may be of value in 
estimating readiness for reading, arithmetic, and writing. The 
manual of directions presents evidence of the value of the tests 
in predicting first-grade achievement as measured by the Pri- 
mary 1 Battery of the Metropolitan Achievement Tests. 


°M etropolitan Readiness Tests: Directions for Administration, Yon- 


kers, N,Y.: World Book Company, 1949, p. 1. 
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ILLUSTRATIVE READINESS TEST ITEMS* 
From the Metropolitan Readiness Tests 


Test 3. Information 
9. Mark the thing to travel in across the ocean. 


Test 4. Matching 


Look at the picture in the middle with the frame drawn around it. 
Find another picture just like it and draw a frame around it. 


[9 


Test 5. Numbers 
2. See the row of hats. Mark the middle hat. 


Test 6. Copying 


1. You are to copy every picture in this column. 


AL 


* Selected from Gertrude H. Hildreth 
Metropolitan Readiness Tests, 
Company, 


and Nellie L. Griffiths, 
Form R, Yonkers, N.Y.: World Book 
1949. (Directions adapted from the manual of directions.) 
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A number of tests specific to reading readiness have been 
devised. The Monroe Reading-aptitude Tests may serve as an 
illustration of this type of instrument.' The following tests, 
none of which require reading, are included: 


1. Visual tests designed to detect perceptual reversals and 
to measure ocular-motor control and memory for forms. 

2. Motor tests of speed, steadiness, and writing. 

3. Auditory tests which indicate abilities in word discrim- 
ination, sound blending, and auditory memory. 

4. Articulation tests designed to evaluate speed and level 
of articulation ability. 

5. Language tests which include measures of vocabulary, 
classification, and sentence ability. 

6. Laterality tests which indicate hand, eye, and foot pref- 


erence. 


Administration 


The following rules should be carefully observed in giving 
a test: 

1. The teacher should be thoroughly acquainted with the 
tests and with the detailed instructions and directions given 
in the test manual before attempting to administer the tests. 

2. The test should be given in a quiet room and interrup- 
tions and disturbances should be avoided during testing time. 

3. Small children should be tested in small groups. 

4. It is important to test children at à time when they are 
not fatigued or overly excited. Pupil attitudes in the testing 
Situation are an important consideration in interpreting test 
results. It is therefore necessary to do everything possible to 
ensure a cooperative attitude on the part of the pupils being 


tested. 
5. Where group tests are used, the testing should be done 


* Marion Monroe, Reading-aptitude Tests, Boston: Houghton Mif- 
flin Company, 1935. 
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by the children's own teacher. This is especially important in 
the case of young children, who are sometimes disturbed by 
the presence of unfamiliar persons. 

6. Short testing periods are necessary for young children. 
It is preferable to test over several short periods rather than 
to attempt to administer the entire test at one long sitting. 

7. Pupils should be seated comfortably and in such a man- 
ner that they are not easily disturbed by one another. 

8. In testing small children, the names and other data re- 
quired should be filled in by the teacher before the testing 
begins. 

9. The printed directions for administering the test should 
be followed implicitly, since any marked deviation from these 
instructions is likely to invalidate the results. 


Interpreting the Results 


Norms are provided as the basis for interpretation of the 
results of readiness tests. These norms may be of various 
types. For example, the norms for the Monroe 4 ptitude Tests 
are presented in the form of percentile ranks. Norms of this 
type are presented for the total test and for each of the five 
scales, which include measures of visual, auditory, motor, ar- 
ticulatory, and language abilities. As a result of experience in 
using the test in connection with other measures, Monroe sug- 
gests interpretation of the results in terms of probable student 
Status as superior, average, or retarded. She also proposes 
methods calculated to overcome a variety of difficulties which 
pupils might encounter in beginning reading.* 

A somewhat different approach to interpretation makes use 
of raw scores, presenting the "probable per cent of failure" 
in terms of the test results.” Still another test manual presents 


*Marion Monroe, Manual of Directions, 
Boston: Houghton Mifflin Company, 1935. 


?J. M. Lee and W. W. Clark, Manual of Directions, Lee-Clark 
Reading-readiness Test, Los Angeles: California Test Bureau, 1943. 


Reading-aptitude Tests, 
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percentile ranks, letter ratings, and readiness status and offers 
interpretations of letter ratings and readiness status." 

As we have seen test scores and norms provide valuable 
information, but interpretations of test results must be based 
upon a recognition of the limitations of the measuring instru- 
ment. Readiness tests provide information covering a limited 
range of abilities and skills. Other factors which enter into 
readiness for a learning task must ordinarily be estimated by 
means of other devices or by observation. Furthermore, since 
it is difficult to get accurate test results with young children, 
the results of readiness tests used with preprimary or primary- 
grade children should be interpreted with due regard not only 
for the limitations of the instrument but also for the difficulties 


of testing young children. 


UTILIZING THE RESULTS OF TESTING 


In actual practice, readiness-test results provide the teacher 


with data that are valuable when used in conjunction with 
other information about the children in his classroom. This 
additional information might well include evaluations of men- 
tal age, emotional and social adjustment, health, vision, hear- 
ing, speech and language development, experiential back- 
ground, ability to solve problems, sense of sequence and re- 
lationships, attention, memory, motor ability, handedness, and 
eyedness. When used in conjunction with such data, readiness- 
test results may be useful as (1) an aid in estimating the 
readiness of a pupil to do the work included in the area of 
testing; (2) an aid in grouping pupils for instructional pur- 
poses; (3) an aid in analysis of instructional needs of a 
Preparatory nature; (4) an aid in estimating the strengths 
and weaknesses of pupils in areas fundamental to the learning 


"G. H. Hildreth and N. L. Griffiths, Metropolitan Readiness Tests, 


Yonkers, N.Y.: World Book Company, 1948. 
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task; (5) a guide to the teacher in adapting instruction to the 
needs of the group and the individual; and (6) a source of 
significant data useful to the teacher in planning work with 
pupils. 

Readiness tests provide at least a partial basis for estimat- 
ing the pupil's chances of success in a particular area of learn- 
ing, for diagnosing specific weaknesses, and for helping the 
teacher to plan a course of preparatory learning. That is, if 
a pupil achieves a relatively low score on a test of word com- 
prehension, he would benefit from a program of experiences 
designed to increase ability in this area before he begins a 
reading program. A low score on a test of knowledge of num- 
bers or number concepts indicates the need for experiences 
in this area preparatory to studying arithmetic. Low scores on 
tests of vision or hearing may indicate the need for special 
medical examinations or for a program designed to improve 
visual or auditory skills. 

Although many of the skills required by school curricula 
may be developed to some extent by carefully planned instruc- 
tion, readiness is in part a matter of maturation rather than 
of learning, and maturational factors are only in part ame- 
nable to instruction. Hence the teacher may discover that cer- 
tain pupils in his group appear to be unable to profit sig- 
nificantly from a program directed toward preparation for a 
learning task. Certain general skills, however, may be im- 
proved where the activities involved are geared to the child's 


capacity. The development of such skills may be enhanced 
through activities designed to encourage: 


1. Alert listening. 
. Ability to follow instructions and directions. 
Keenness of Observation, as in discrimination of like- 


nesses and differences, perception of form, quantity, 
size, color, and so on. 
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4. 


Questions, conversations, evaluations, judgments, and 


sharing of experiences. 
Development of meanings through varied planned ex- 


periences. 
Active attention, recall, and organization of meaning- 


ful materials. 
Development of skills of observation and interpretation 


of pictures, events, and materials. 
Development of skills in problem solving, planning, and 


construction. 
Development of motor skills related to the various learn- 


ing tasks. 


Harrison? has proposed that reading readiness may be in- 
fluenced by instruction which fosters: 


1. 


NAWRYN 


Carter and McGinnis" present the following 


Extension of meaningful concepts. 

Extension of spoken vocabulary. 

Accurate enunciation and pronunciation. í 

A desire to read. 

Correct use of simple sentences. 

Ability to do problematic thinking. 

Ability to keep a series of events in mind. 

list of activi- 


ties to facilitate reading readiness: 


1s 


New N 


" M. Lucile Harrison, Reading Readiness, 


Looking at picture books and telling stories suggested by 


the pictures. 
Dramatizing children’s stories. 
Telling short stories in response to questions. 


Listening to children’s stories and poems. , 
Telling, in response to questions, what happened in a fa- 


miliar story. 
rev. ed., Boston: Hough- 


ton Miffli 39, p. 6, Fig. 1. 
ifflin Company, 1939, p ing to Read, New 


“H. J. L. Carter and D. 
York: 


J. McGinnis, Learn 
McGraw-Hill Book Company, Inc., 1953, pp. 63-64. By per- 


mission, 


116 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 

6. Making scrapbooks of pictures taken on a vacation trip; 
telling group of trip. 

7. Looking through magazines for interesting pictures and 
later telling original stories about them. 

8. Bringing interesting books and phonograph records from 
home to be shared with the group. 

9. Sharing juvenile books with others; understanding that 
books belong to children and that they may be used by 
them. 


10. Telling of one's own experiences before a group. 


Although these suggestions for preparatory activities have 
reference specifically to reading, they may, in principle, be 
applied to other areas of instruction. For example, activities 
designed to stimulate children's interest in and develop mean- 
ingful concepts related to arithmetic could include dramati- 
zations, play stories, games, story telling, picture interpreta- 
tions, and manipulation of concrete materials, all involving 
quantities. In the fields of social studies and science meanings 
are developed in terms of experiences which can be prepara- 
tory as well as directly instructional in nature. 

The teacher, however, will recognize that certain aspects 
of readiness cannot be learned but are dependent upon mat- 
urational characteristics. That is, there is a period in a child's 
development when it becomes possible for him to undertake 
certain tasks; prior to this time, these tasks may be difficult 
or perhaps impossible. Training in the absence of the required 
maturation is not effective, and a child may be as unprepared 
for some aspects of a readiness program as he is for the learn- 
ing task which is envisaged as the outcome of the training. 
However, school is not necessarily a loss for those children 
who seem to be markedly below the stage of readiness for 
some types of learning; they may be gaining in social skill and 
in familiarity with new surroundings or broadening their ex- 
periences so that when they do become ready for reading and 
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arithmetic they will have somewhat fewer concomitant ad- 
justments to make. 


SUMMARY 


Children differ in many characteristics that influence their 
readiness to undertake the variety of learning tasks involved 
in the school curriculum. In certain areas of learning, partic- 
ularly in the areas of reading and arithmetic, the specific tasks 
have been defined and tests have been devised to measure the 
extent to which the child possesses the required skills. Such 
tests are generally called readiness tests, and most of them 
have been devised for and used with school beginners. The 
concept of readiness, however, does not necessarily apply to 
this age group alone, nor to only one or two areas of learning. 

Ideally, teacher estimates of readiness should include men- 
tal, motor, social, and emotional maturity as well as the spe- 
cific abilities, skills, and knowledges required for the learning 
task. It has been demonstrated, however, that the results of 
readiness tests are predictive of success in such areas as read- 
ing, and that readiness programs designed to prepare the child 
for the learning task to come do increase the child’s chances 
of success. 

In his selection of a readiness test the teacher should be 
guided by the needs of his group. The extent to which the 
test will enable the teacher to analyze the skills, abilities, and 
achievements involved in learning the skill will be an im- 
portant consideration. The teacher should create the best pos- 
sible conditions for administering the test and should follow 
the standard directions implicitly if the results are to be in- 
terpreted in terms of the norms and classifications included in 
the test manual. Typically, readiness-test manuals provide the 
teacher with many suggestions for the interpretation and 


Utilization of results. 
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If test results are to be used for more than screening pur- 
poses, the teacher may plan a program of learning experi- 
ences designed to develop the skills and knowledges which will 
prepare the child for the more formal instruction to follow. 
But many factors other than specific skills will probably enter 
into the teacher's consideration in planning such a program, 
since maturational factors may impose a limitation upon the 
effectiveness of specific instruction. 


STUDY AND DISCUSSION EXERCISES 


1. Outline as specifically as you can the skills which you feel 
are involved in one of the following activities: writing, addition of 
simple fractions, map reading, library reference, use of the dic- 
tionary, or some other specific learning task in the area of your 
interest. 

2. What activities are you able to devise which might represent 
a readiness program for the learning task which you have analyzed 
above? 

3. Indicate the characteristics which you would study in esti- 
mating a child's readiness for reading. Suggest the possible sig- 
nificance of each in relation to the child's chances of success in 
your reading program. 

4. In what ways might the results of selected reading-readiness 
tests be utilized in developing a readiness program for school be- 
ginners? 

5. Suggest reasons for using mental tests as part of the readi- 
ness battery. If possible, validate your conclusions by reference to 
the literature on readiness. 

6. It has been suggested that teacher ratings are predictive of 
readiness. On what bases would you rate a child as to his possible 
chances of success in beginning arithmetic? 


SUGGESTED ADDITIONAL READING 
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CHAPTER EIGHT 


Appraising Personality 


Educational tests are customarily classified as ability, achieve- 
ment, or personality tests. The teacher must bear in mind, 
however, that fundamentally these different categories rep- 
resent simply different vantage points; the various tests pro- 
vide different views of the pupil. Intelligence or aptitude is 
intimately related to achievement, and both intelligence and 
accomplishment are limited aspects of the total personality 
of the child. In short, all techniques of measurement and eval- 
uation are approaches to the understanding of personality. 
Thus, personality appraisal is treated here in a separate chap- 
ter only for the sake of convenience, for the totality which is 


the child can be fragmented only in textbooks and in aca- 
demic discussion. 


THE MEANING OF PERSONALITY 


Personality is a term which designates the person as he be- 
haves in his characteristic milieu. Tt embraces what he is, was, 
and can or will be; it is what he hopes to be, loves, hates, 
fears, and is confident of, and how he works and plays. Be- 
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cause of the inclusiveness of the concept, intelligence tests and 
achievement tests are at best approaches to the understand- 
ing of the total personality, and we shall deal here only with 
the facets of personality that are especially important in the 
conduct of school life. 

Many of the significant aspects of personality concern rela- 
tionships with others. Ways of adjusting to others, ways of 
relating oneself to them, ease of communication, and trust or 
distrust of both intimates and strangers are significant aspects 
of personality. To measure this tangible and important facet 
of personality, test makers have devised social-adjustment 
questionnaires and inventories and personality schedules con- 
taining a large proportion of questions that sample interper- 
sonal relations. The popular concept of personality concerns 
this social aspect almost exclusively: the word is commonly 
taken to mean attractiveness to others; a person who is almost 
automatically liked by others is said to have a “wonderful 
personality.” This social phase of personality is indeed im- 
Portant; the creative genius must communicate his ideas to 
Others, and to the extent that he fails to do this he is pop- 
ularly thought to have a “poor” personality. Similarly, the 
Person who can establish easy contacts with others, even if he 
is of limited intelligence, is said to have an effective person- 
ality. Limiting personality to sociability has certain advantages 
where testing is concerned, but, as will be shown in the sec- 
tion entitled “Personality Inventories,” some difficult technical 
Problems arise in evaluation. ee 

The following definition of personality emphasizes sociabil- 
ity: personality is the total pattern of behavior and behavior 
tendencies as they affect others; it is the adjustment of the in- 
dividual as it affects others. Such a definition introduces a dif- 
ficult problem in the evaluation of personality—namely, the 
fact that the teacher’s evaluation of the pupil, or any person’s 
evaluation of another, is dependent upon the perception of 
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the evaluator. One's description of another reveals, at least to 
some extent, what one is himself. This is true not only in 
verbal descriptions but also in instruments designed to ap- 
praise personality. Specifically, an instrument for giving an 
indication of adjustment reflects such factors as the interests, 
Scholastic background, and experience of the author of the 
device. It is extremely important to keep this in mind as we 
deal with personality appraisal, because it will help us to ex- 
ercise the proper restraint in interpreting the instruments that 
purport to measure so multifaceted a concept as the human 
personality. 

Some specialists define personality as a degree of consist- 
ency—the extent to which a person may be depended upon 
to behave in specific ways in his day-to-day conduct. Others 
refer to this degree of consistency as "character." The distinc- 
tion is academic, however, because in the larger sense, char- 
acter, like intelligence and achievement, is one of the many 
aspects of personality. Regardless of whether we call it char- 
acter or personality, the element of consistency is important. 
In fact, the whole object of personality appraisal is to predict 
what the individual is likely to do—how he is likely to be- 
have, what situations probably will upset him—so that we 
may help him more effectively. If it were not for this con- 
sistency, there would be little chance of predicting probable 
reactions. Thus the purpose of instruments for appraising per- 
sonality is to determine those consistent elements by asking 
the subject or persons who know the subject what his re- 
Sponses to certain situations have been. His future actions are 
predicted on the basis of his answers. 

Personality refers to inner, unobserved motives and pro- 
clivities as well as to external, observable behavior. The im- 
portance of the “inner man,” the “private world” of the indi- 
vidual, is indicated by the fact that a substantial amount of 


mental ill health, or personality disintegration, is justifiably 
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attributed to the individual's lack of understanding of his 
“true self.” Misunderstanding between persons is often due to 
the difficulty of communication between these inner but basic 
and fundamental selves. Hence, another significant approach 
to the study and understanding of personality is the acquiring 
of clues to the nature of these hidden aspects of attitude and 
conduct. 


Misconceptions in Appraising Personality 


Some obvious misconceptions concerning personality need 
to be briefly mentioned. Many people still fail to recognize 
the fallacy of categorizing or “typing” personalities without 
allowing for “in betweens,” in spite of the conclusive evidence 
of psychology and sociology. The idea that there is a relation- 
ship between hair color and temperament has been found to 
be erroneous, yet one frequently hears personality interpreted 
according to this misconception. The fact that there is no 
connection between personality characteristics and inherent 
racial factors has been demonstrated in psychology and an- 


thropology, but one still hears unenlightened references of 
this type to Negroes, Mexicans, Japanese, and other groups. 
ated with religious dif- 


Sometimes character traits are associ 
ferences, i.e., the selfishness of the Jews. Data from careful 


research point to the fact that correlations between race or re- 
ligion and character traits are SO low as to render invalid any 
inferences regarding individuals that are derived from such 
generalizations. It is well known that, aside from cultural 
factors, the differences between races are slight. The safe 
conclusion is that differences between races and religious 
groups are much slighter than are the differences. within 
them, 

The fact that one cannot judge personality from appear- 
ance remains for many teachers mere academic knowledge, 
for teachers still refer to “pright-looking children” or “ob- 
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vious dullards." There are two reasons why teachers should 
avoid appraising personality on the basis of appearance. 
First, there are clinical “types” of mental defect, such as 
Mongolism, cretinism, hydrocephalism, and microcephalism, 
which have recognizable facial and bodily characteristics. But 
these recognizable features are not present inside the range 
of what are considered to be normal individuals. Thus we will 
miss the mark if we use appearance as the criterion when 
working with typical school children. Second, teachers, like 
other persons, tend to read into what they see what they want 
to see; hence, their judgment of personality, even after a pe- 
riod of acquaintance, must be cautious. 

The belief that there is an accepted norm for personality 
development is a misconception. Some educational and psy- 
chological literature creates the impression that extrovertism 
and sociocentric behavior should be the norm—norm in this 
case meaning a standard. Other competent scholars emphasize 
that "it takes all kinds"—that there is a place in society for 
both the extrovert and the introvert and for all those who 
come between the extremes. Some pupils are well adjusted 
even though they are not highly social or outgoing individ- 
uals. In a democracy, it is recognized that different individuals 
make their contributions to the total welfare of society in dif- 
ferent ways. The school might well take the position that the 
development of uniqueness (within limits, of course) is a defi- 
nite responsibility. Hence, as teachers we should not neces- 
sarily encourage all boys to be athletes, and we need not nec- 
essarily worry about girls who do not seem to enjoy dancing. 
Such differences among pupils need not disturb us so long 
as they do not display symptoms, or patterns of symptoms, of 
less-than-desirable adjustment. The defect of some standard- 
ized tests of personality is that their 


deviation from a hypothetical avera 
sirable thing. 


norms seem to imply that 
ge is necessarily an unde- 
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Another problem in dealing with personality is the diffi- 
culty of defining traits. For example, different interpretations 
are placed on “honesty,” “application,” “dependability,” and 
“adequacy of feelings of personal worth.” This difficulty con- 
trasts with the difficulty of defining intelligence. Intelligence 
has many facets, but most of them are recognized as aspects 
of intelligence. However, when personality traits differ in de- 
gree, they may become something else. Thus, self-reliance is 
an extension of dependence, but if it develops still further it 
becomes selfish egotism. As one outgrows submission, he be- 
comes ascendant, but if the characteristic develops still more, 
the individual is called domineering and with further devel- 
opment of the trait, tyrannical. Thus personality testing deals 
with varying degrees of different but intimately related traits. 
As difficult as it is to accept a measurement of intelligence as 


being valid and reliable, it is still more difficult to place 


credence in personality tests, since trait definition is even 


more elusive than definition of abilities. 

A misconception reflected in many tests is that personality 
is static. The fact is that personality is both complex and vari- 
able. People not only experience gradual change, but their ac- 
tions are variable within a few moments. This principle does 
Not conflict with consistency in personality; the variability of 
behavior has been partially described in the statement that 
the manifestation of a trait is specific to a situation. For ex- 
ample, a person may be honest when it comes to shunning 
the use of his neighbor's answers on an examination, but he 
may not be honest when it comes to returning extra change 
he has received at the ticket window of a movie. One may be 
neat in the care of his room at home but exceedingly careless 
with the appearance of the spelling and arithmetic papers he 
Presents to the teacher. Thus in using adjustment inventories 
il is well or teachers to beat fo mind the difficulties involved 


in getting an accurate picture of “the total situation.” 
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The Fallacy of Types 


There seems to be a well-nigh universal temptation to 
classify people. Such opposites as the good and the bad, the 
white and the black, the new and the old, the traditional and 
the progressive are indicative of this tendency. One of the 
early attempts at classification was a differentiation of body 
types and an attempt to parallel these types with personality 
characteristics. E. Kretschmer postulated three major types of 
body build with corresponding personality attributes—the 
pyknic, the asthenic, and the athletic. Periodic follow-up stud- 
ies have indicated that the "types" are actually continuous 
and therefore the classification is futile. Despite the experi- 
mental evidence, there are periodic recurrences of schemes 
for typifying. Dominance-submission and introversion-ex- 
troversion are not far from such older categories as san- 
guine, choleric, and phlegmatic. The attractiveness of the 
practice of "typing" personalities is exemplified in the re- 
marks of teachers characterizing pupils as being normal or 
abnormal, academic or mechanically minded, and friendly or 
hostile. 

As we have seen, evaluating personality on the basis of ap- 
pearance is dangerous precisely because of the element of 
truth involved. Similarly, "typifying" is dangerous because of 
the degree of validity inherent in the descriptions; there are 
introverted, sanguine, academically gifted, and mechanically 
apt persons. But there are also many people between the two 
extremes, and there are those who possess some of two or 
more characteristics, and there are different manifestations of 
a given trait in various situations. It follows that measures or 
evaluations of personality based on a bimodal, trimodal or 
even multimodal distribution should be interpreted with stud- 
ied caution. Although such questionnaires or scales have cer- 
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tain values, those values do not necessarily reside in the fact 
of classification, We may be aided in our understanding of 
children by these personality measures, but it is not because 
the pupils have been "typed." 


PERSONALITY RATING SCALES 


A rating blank, scale, or schedule is a formal set of ques- 
tions asked of one person about another or a self-rating form 
in which the individual checks certain questions about him- 
Self. The questions are answered in terms of the degree to 
which the individual has the trait or does the act described in 
the question. Thus, the question may be, “What is his (or 
your) attitude when facing difficult schoolwork?" Answers 
may be arranged along a continuous line with a mark indicat- 
ing divisions between very poor, poor, average, good, or ex- 
cellent. Such evaluations are, however, considered to be too 
vague to be maximally useful, and descriptive phrases are be- 
lieved to lead to greater accuracy. Thus the item, “How ef- 
fectively does he apply himself to an activity?” is answered in 
a weighted scale, allowing a certain number of points for each 
answer, These answers range from *(1) Shifts about in ran- 
dom fashion,” through “(3) Sticks to an activity until some- 
thing more interesting is presented,” to *(5) Voluntarily pur- 
Sues an activity for two or more days consecutively." Many 


Of the more recent rating scales use the more precisely de- 
which has the advantage of strengthen- 


ale. 
assroom use from the 


Scriptive approach, 
ing the objective element in the sc 

Rating scales are available for cl 
nursery and kindergarten level through the college level and 
are sometimes used in business and industry. By means of 
them, many different aspects of personality can be investi- 
gated: there are, for example, scales measuring ascendance- 
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submission, behavior maturity, self-adjustment, delinquent 
and predelinquent behavior, attitudes, interests, and social 
adjustment. Sometimes the schedule includes several kinds of 
situations under a single heading; for example, an adolescent 
rating schedule includes fear, family emotion, family author- 
ity, feeling of inadequacy, nonfamily authority, maturity, 
escape, neurotic traits, and compensation. 

Teachers who wish to use personality rating scales are ad- 
vised to consult the current volume of the Mental Measure- 
ments Yearbook, where they will find descriptions of the 
kinds of behaviors which are supposed to be analyzed and the 
levels for which the schedules are specifically designed. More 
pertinent still, the instruments have been critically examined 
and carefully evaluated by scholars in the measurement field. 
The reading of these appraisals will help teachers to come to 
an evaluation regarding each scale which will enable them to 
use the results most accurately and effectively. 

A number of precautions must be observed in using per- 
sonality schedules; some of these were anticipated earlier in 
the chapter. (1) It is just as difficult to formulate a precise 
definition of the traits that are evaluated by means of the 
Scales as it is to define personality. (2) There are no widely 
accepted norms for what should constitute desirable behavior. 
(3) The "specificity" of behavior makes it unlikely that the 
demonstration of a particular trait in one situation will be an 
accurate sample of that same trait as it might appear in an- 
other context. (4) The element of and danger from subjec- 
tivity is an ever-present complicating factor. The last seven 
lowing Statement, taken from an evaluation 
Measurements Yearbook, are probably per- 
€ scales and inventories available at present: 
- « being little better or Worse than the average person- 


tinent to all thi 


“ 


* Cowan Adolescent-adjustment Analy 


. zer: An Instrument of Clin- 
ical Psychology, Salina, Kans.: 


Cowan Research Project, 1946. 
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ality questionnaire of its kind, this inventory makes up for 
none of the serious limitations still inherent in these instru- 
ments."* 

The necessity for caution in the use of an instrument should 
not cause teachers to repudiate it entirely. Rating scales can be 
used to advantage if teachers will observe the following pre- 
cautions: (1) Children should not be labeled predelinquent, 
neurotic, or poorly adjusted as a result of their scores or 
standing on a rating scale. Because of the likelihood of change 
and growth, the importance of the subject’s mood when he 
answered the questionnaire, and the possible influence of the 
mood of the person who interprets the results, the scores 
Should not be placed in a permanent record folder. The ques- 
tionnaire may, however, be used by the teacher for a tem- 
Porary and tentative evaluation. (2) Specific items on the 
questionnaire may serve to direct the teacher to a further in- 
vestigation of behavior in a particular area; that is, an atypi- 
cally answered item may suggest other questions that will lead 
to a better understanding of the individual. (3) The teacher 
Should bear in mind that the rating scale does not constitute 
à diagnosis. It may supply some data which will make effec- 
tive diagnosis possible, but in the final analysis the individual 
items on the scale and the total score must be interpreted by 
the user of the scale. (4) The data obtained from the rating 
scale should not be regarded as conclusive or infallible. 
Rather they should be regarded as supplementary information 
which provides a test of the validity of data or conclusions ob- 
tained from the teacher's observation of the child. 

The need for the exercise of these precautions is in 


in the following statement: 


dicated 


? Albert Ellis in Oscar K. Buros (ed-), The Third Mental Measure- 
ments Yearbook, New Brunswick, N.J.: Rutgers University Press, 


1949, p. 69. 
? Laurance F. Shaffer in Buros, op. cif P- 56. 
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Such devices vainly seek the pot of gold at the end of the rain- 
bow: a simple, cheap, foolproof method for studying human per- 
sonality. Teachers, administrators, and school counselors who are 
tempted to consider the use of such devices would be benefited by 
a psychological insight into the fact that their own great need to do 
something about personality problems leads them to the delusion 
of accepting instruments of very low objective value. 


Rating schedules must be used with consideration for their 
inherent limitations; hence conclusions based upon them must 
be temperate and tentative. 


PERSONALITY INVENTORIES 


A personality inventory is a questionnaire on which the 
subject checks his reactions to a number of specifically de- 
scribed situations. He may be asked how he typically reacts, 
how he thinks he would feel in specific situations, or whether 
certain events have occurred in his life. Examples of each of 
these types of questions are: *Do you cross the street to avoid 
meeting someone whom you dislike?" *At an automobile 
wreck, would you get sick at the sight of blood?" *Have you 
been knocked unconscious by a blow on the head?" 

Many other situations 


and classes of situations are 
“plumbed” 


in an inventory. No one question is considered 
crucial; it is the total response to all the questions—the pat- 
tern of the answers—that is considered significant. If the re- 
sults are not interpreted too specifically, the general trend of 
personality orientation indicated is helpful, but as in the case 
of rating schedules, the temptation to label or classify should 
be avoided, even though the enthusiastic test maker may him- 
self have classified the results, 

Some of the difficulties involved in devising instruments to 
probe personality have already been suggested. Among these 
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Tipo Hym a a en 
; e test maker creeps into the 
F he asks, (3) the lack of a well-defined norm for 
: a and personal behavior, and (4) the variability of be- 
avior in diverse situations. 

X len imp endangers several aspects of personality meas- 
tent: not only is the interpreter of the test subjective, but 

so inevitably is the individual taking the test. This aspect of 
ped testing is of importance to us here because, thus 
ye ilie cnin view of personality testing has been presented 
Te apter, and the teacher has the right to ask, “If in- 
bi neg and sales are so subject to criticism, should they 
at all?” Actually, the subjective nature of inventories 

and scales points up the advantages of what are called projec- 
tive techniques, as we shall see later. 
As we have seen, each personality is a “private world.” As 
the individual grows and develops he learns certain tricks for 
protecting himself from “the slings and arrows of outrageous 
fortune” —for defending himself from the psychological and 
Physical batterings which even à protected existence entails. 
Critics of extreme behavioristic psychology have pointed out 
that individuals do not react to stimuli in a simple, mechan- 
ical fashion; rather, each individual has a unique response. 
The late J. S. Plant described this private world as follows:* 


child and the sweep of social pres- 


Between the need of the 
psycho-osmotic envelope of 


8 à 

Ls lies a membrane—a sort of 
r Ta ; : 
anscending importance. . - - One should never think of this as 


a tangible, material structure. It is rather a property of that part 
of the personality which is in touch with the environment. 

It is only through the operation of the envelope that we can get 
at the problem of meaning—what anything “means” to the indi- 


New York: The Commonwealth 


*James S. Plant, The Envelope, 
f the Harvard University Press, 


Fund 195 ln 
> 0, pp. 2-3. By permission O 
Publishers. oP 
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vidual. . . . Certainly one of the most brilliant of the psycho- 
analytic contributions has been the theory that one sees the world 
only as he can afford to see it—that the material of the environ- 


ment is sensed by the personality only in terms of the problems 
which it is trying to work through. 


This “envelope” which protects the individual from hostilities 
and contributes to his uniqueness is in turn protected by the 
individual, who practices a measure of self-concealment. Often 
when he does wish to reveal himself he is unable to think 
clearly enough about his feelings to verbalize or describe 
these inner workings. 

Thus, in terms of objectivity, questionnaires suffer from 
two inescapable shortcomings: (1) the inability of the in- 
dividual to evaluate with accuracy his own feelings, and (2) 
the individual's desire to keep his feelings to himself. A third 
shortcoming operates certainly in the upper grades and at the 
high school level, and perhaps even earlier; (3) the desire 
deliberately to mislead others. The motive for such behavior 
may not be negative; it may simply be a desire to please the 
teacher, for example. 

Inventories, like scales, should be used only with a proper 
regard for their limitations. They can be used to supplement 
other measures and observations and to help the teacher in- 
vestigate and gain some understanding of a particular area of 
adjustment, such as schoolwork and family or peer relations. 
Atypical responses on a questionnaire may serve as a point 
of departure for a fruitful interview. The teacher should re- 
member, however, that the scores on an inventory do not con- 
stitute a diagnosis. The following statement applies to several 
personality measures that attempt to define behavior precisely 
and categorically: “The worst features of the tests, in the 
opinion of this reviewer, are the elaborate suggestions to 
teachers for the treatment of conditions claimed to be revealed 
by the scores, profiles, and even individual item responses. 
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When not clearly dangerous, these procedures are stereotyped, 
superficial, and lacking in clinical sense.” 


INFORMAL APPROACHES TO 
PERSONALITY EVALUATION 


Some approaches to personality assessment are valuable 


because they are admittedly subjective and users cannot es- 
cape the pervasiveness of the subjectivity. Because of the 
obvious presence of the personal element, there is much less 
danger that the tester will think he has an accurate measure 
of personality than might be the case with scales and inven- 
tories in which norms have been cited. These fruitful meth- 
ods of personality evaluation are (1) anecdotal records, (2) 
teacher-pupil conferences, and (3) staff meetings. 

The anecdotal record is an attempt to “catch” the child 
in a word picture when he is his typical or average self. The 
teacher describes without attempting evaluation or interpre- 
tation, a particular youngster as he is performing some char- 
acteristic action. The anecdote is designed simply to indicate 
to the teacher, at a later date when evaluation of the child's 
growth is desired, what the child was like at a certain time. 
The child's next teacher may use the anecdote, along with 
other data, to get a more complete picture of the child as he 
has been in the past. Certain precautions should be observed 
in making and using anecdotal records, however; the de- 
scribed action should be a typical one (teachers sometimes 
make the mistake of picking the strange or bizarre action to 
Tecord), and interpretive and evaluative terms should be 
avoided in the wording of the behavior descriptions. A good 
method for making an anecdotal record js to decide at the be- 
ginning of the day to record the behavior of Albert B. at 
diets ol que Kabana ui Ted T t RU EIN 


* Douglas Spencer in Buros, 0P- cit., p. 58. 
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way the teacher can gradually acquire anecdotes of typical 
behavior for each pupil in his group. The outcome might be 
something like the following:* 


September 30. Jackie has been paying special attention to Elsie 
the past few days. He put a piece of bubble gum on her desk, put 
his hands into his pockets, cast his eyes up to the ceiling, walked 
a few steps away, whistling between his teeth. Elsie took the gum, 
raised her eyes, lowered them, said nothing; but Jackie seemed 
satisfied. He has been trying to give her clean notebook paper 
every day. 

October 6. The class chose Jackie and Mort to keep our part 
of the grounds this week. Both stayed in at recess, so the girls 
picked up the paper for them. Mort asked what to do about 
being grounds monitor. Jackie said: “If the kid is littler’n you, 


make him pick it up. If the kid is bigger’n you, report him to the 
teacher." 


Teacher-pupil conferences in which the teacher does a 
great deal of listening are an excellent way of gaining under- 
standing of a pupil’s personality. In these conferences the 
teacher should play the role of “counselor with” rather than 
“adviser to.” A questionnaire may tell how a person typically 
acts or how he has behaved in the past, but a conference in 
which the teacher listens at least part of the time will produce 
much more information about why the pupil behaves as he 
does. The difficulty with the technique is that teachers tend 
to give too much advice, although analysis of their successful 
experiences in working with pupils shows that solutions were 
discovered only after they had gained, through listening, an 
understanding of how the pupil felt about his difficulties. At 
every age, pupils talk freely with teachers who are patient 
listeners. Often teachers are in such a hurry to get results that 


"Helen Bieker in Fostering Mental Health in Our Schools, 1950 
Yearbook, Association for Supervision and Curriculum Development, 
Washington: National Education Association, 1950, p. 189. 
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they fai 
aen loc dre vli 
One relativel “ve i 
ves bec y unexplored but highly fruitful way of se- 
nid it p ee of individual children is the con- 
Pli aue posed of a small number of teachers. In these 
EIS pa teadhor mentions the name of a pupil he would 
bo nell p gi a professional basis. Teachers who have had 
E E Lies. pe will be able to suggest helpful approaches 
ifia des their insights into his problems. Frequently 
oibus. ho do not know the pupil concerned will make 
i a contributions to such conferences, since a teach- 
fully S mi and knowledge can sometimes be most fruit- 
uis sa t to bear when he does not know the pupil. One 
age Anm has tried this technique frequently by reading 
«ovd: : particular case to a group of teachers; the suggested 
i ge es to the problem involved have often been prac- 
end f he same as those suggested by psychologists on the 
the test data and interviews. 7 
e. informal approaches (anecdotal records, teacher- 
m | ima and teacher conferences) are especially 
ie Ah ecause they do not promis a miraculous conclu- 
lihood ey are admittedly subjective; hence there is more like- 
Hoi Mr: appropriate allowance will be made for subjec- 
D he advantage of these techniques over formal instru- 
s is that teachers can base their subjective judgments and 


e s 
valuations on objective data. 


PROJECTIVE TECHNIQUES 


is, at least in part, the private 


As we have seen, personality 
e extent that this is true, one 


E of the individual. To th 
Ust, in order to understand the vast realm of emotional ex- 
Perience, study the individual when he is off guard. Various 
Projective techniques provide fruitful approaches to this 
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aspect of personality. Many projective techniques have the 
double advantage of providing some degree of therapy during 
the process of study or analysis, for as the child carries on the 
activities that will be observed and interpreted, he is also 
getting rid of some of the tension that is complicating life 
for him. 

A projective technique involves a situation which is mean- 
ingless, ambiguous, amorphous, or neutral. What the person 
being tested does or sees in these meaningless circumstances is 
not dictated by external questions, directions, or demands; his 
actions are an expression of himself. The meanings which he 
believes are present in the pictures or stories are meanings 
which he puts there himself. 

One of the earliest and most widely used projective tech- 
niques, the Rorschach test, presents a series of ink blots such 
as could be made by allowing a drop of ink to fall upon a 
paper from a height and folding the paper over in such a way 
as to produce symmetrical halves. Some of the blots in the se- 
ries are black, some have many colors; being formless, they rep- 
resent nothing. The subject is asked what he sees in them; 
what he reports is, obviously, a projection of himself. The 
scoring and interpretation of the Rorschach blots are involved, 
extensive, and time-consuming processes which require highly 
specialized training. Research is still going on, but there is no 
indication at present that this process will become a routine 
classroom technique. The untrained individual must be warned 
against the dangers of uninformed and irresponsible interpre- 
tation of responses to projective techniques. 

Other projective techniques include a cloud test, in which 
each member of the group tells what he sees in a pictured 
cloud formation, much as children describe their castles in 
the air; a sentence-completion test, in which the subject is 
presented with the first part of a number of sentences and is 
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eir ia nari e in any way he sees fit; and a story- 
: 3 ich part of a story 1S read to the sub- 
ject, who is then asked to tell what happened in the rest of 
the tale. 

Play techniques consist of giving the child a few toys to 
play with and observing what he does with them, or what he 
has the toys and dolls do. An important element in play tech- 
niques is a high degree of permissiveness (ie. the child is 
E dean that there are no important compulsions or re- 
t€ s being placed upon him) which cannot practically be 

fs = part of the classroom situation. However, the princi- 
E play techniques are useful to the teacher in that the 
gives a picture of his inner self when he is engaged in 
Spontaneous play, either alone or with others. By cautiously 
interpreting this behavior and evaluating it against other data, 
the teacher can see more clearly specific aspects of the child's 
personality. The teacher might well be advised to see that in- 
terference with what to the adult are objectionable aspects of 
iw is held to a minimum, thus allowing the child some 
ance to "spill over" with some of his hostile or frustrated 
feelings. 

Some practices which, wW. 
Perhaps not call projective tec 
diate use by teachers in understa 
tions. One of these is free or crea 
encouraged to write whatever he likes—stories, poems, biog- 
Taphies, or articles—and criticism of content and composition 
is kept to a minimum. When sound rapport exists between 
teacher and pupil, trends will frequently appear in the writing 
that will serve as diagnostic aids to the teacher. No great re- 
liance should be placed on single bits of writing; it is the 
Tecurrent theme that is important. Since some youngsters have 
difficulty in thinking of what to write, suggestions may be 


hen used informally, we should 
hniques can be put to imme- 
nding personality orienta- 
tive writing. The pupil is 


D: 
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made: My Favorite Pastime, My Pet Peeve, My Ideal Boy 
Friend, My Kind of Father, etc. As with play techniques: 
data from free writing should be interpreted cautiously and 
should be regarded as supplementary information. 

Fingerpainting is a favorite technique of many classroom 
teachers for getting revealing glimpses of pupils from the ue 
ginning of their school experience. Some pupils are teluctant 
to participate, and even this reluctance, in conjunction with 
other information, may be revealing or suggestive of person- 
ality trends. The kinds of color, the kinds of strokes, and the 
degree of freedom of movement and care employed all may 
give the teacher clues to the meaning of behavior. The au- 
thors do not recommend direct interpretation of these features 
of the paintings, however; they are clues only. Quite apart 
from analysis, many teachers have found that children talk 
more freely when they have a picture to which to point; that 
is, a child may be unable to discuss a feeling such as a resent- 
ment, but he may be able to paint it and describe what he has 
painted. 

Working with clay is another projective technique. As with 
paint, the characteristic way of dealing with the medium, the 
vigor of movement, and the degree of satisfaction or discon- 
tent with the product are all elements that might be involved 
in the interpretation. 

The advantage of projective techniques, in the main, is that 
they have not become routine, stereotyped, and standardized. 
There is, of course, the danger that the user will project him- 
self into the interpretations and conclusions, but the teacher 
who attempts to interpret what he sees in a child’s writing, 
play habits, and art processes and products fully realizes that 
the interpretation is wide open to error; consequently he is 
careful in its use. Such data, cautiously used, may frequently 
be more helpful in the evaluation of personality than the re- 
sults of standardized instruments that give apparently accurate 
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statistical i i 
tistical interpretations of data that are necessarily subjec- 


tive and approximate.* 


SUMMARY 


i papier for measurement or evaluation are in reality 
3 pproaches to the understanding of some aspect of 
personality. The word personality is an inclusive term embrac- 
d Heino, inner feelings, and what others think of one. It is 
satis te at a thing 80 complex and ever-changing cannot 
nc pide = in a mathematical sense. The difficulty 
diia Rh however, should not lead the teacher to a re- 
bou" the available instruments. Rather, an understand- 
mal he complexity of the problem should underscore the 

à r proper caution and reservation of final judgment. 
Rating scales are designed to systematize judgments or ob- 
TUM on regarding oneself or others. The shortcomings of 
is rese are that they are subjective and must of necessity 
is à : by the particular questions asked—the things the 
aker thinks are most significant. If the teacher uses 


th ; ea : s ; 
em with these limitations in mind, they provide useful cor- 
information. If. however, conclusive 


recti : 
ctive or corroborative 
e results will 


Judgments are based on the attractive norms, th 
be unfortunate for many pupils. 
Ncc inventories are su 
iue of rating scales. They ha 
a where they may be used as the 
at view or as supplementary data. If the teacher finds him- 

attracted by the statistical norms which sometimes ac- 
Company such tests, he would do well to heed the words of 


7 
iig yes approach to the understanding of pers 
dat a group, or sociometry—will be examined 
equi ique is available to classroom teachers without 
pment save perhaps à book which explains in det: 


Problems involved in the approach. 


bject to limitations similar to 
ve their place in evaluation 
starting point for an 


onality—function- 
in Chap. 8. This 
any outlay for 
ail some of the 
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George G. Thompson: “[It] is surprising (in the face of this 
preponderance of negative research findings) that these per- 
sonality questionnaires should continue to be so widely used 
in school and youth-guidance organizations!” 

Some recent developments in personality evaluation give 
indications of overcoming some of the defects of older meas- 
ures. Inclusively, these tools are called projective techniques. 
They include art, free writing, and spontaneous play used for 
the purpose of gaining an understanding of children. One ex- 
planation of the value of these instruments is that they are ad- 
mittedly subjective and approximate. They unearth clues or 
furnish supplementary data. Specifically, although one would 
not be justified in concluding from a child's drawings that he 
has a mother fixation, one can discover signs of emotional 
tensions that should be more carefully studied in home visits, 
interviews, and further psychological investigation. 

The evaluation of personality is an inescapable responsi- 
bility of the school, since evaluation must precede construc- 
tive help. The instruments available today for evaluating per- 
sonality are tools for increasing the accuracy of the teacher's 
perception, just as the stethoscope increases the accuracy of 
the doctors diagnosis. The fact that personality instruments 
are imperfect indicates only that they should be used with 
appropriate regard for their shortcomings, for they provide 


a means of arriving at a tentative evaluation of certain aspects 
of the child's personality. 


STUDY AND DISCUSSION EXERCISES 


1. What is the significance for teachers of the statement, *When 
one describes the personality of another, he reveals himself"? 


2. Point out some instances in typical everyday conversations 


* George G. Thompson, Child Psychology, Boston: Houghton Mif- 
flin Company, 1952, p. 614. 
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which indi : 
ich indicate the tendency to classify persons as personality 


types. 
us nied the reviews of three or four well-known personality 
E , using the Mental Measuremianis Yearbook. Do the re- 
^ nerd Or contradict the views presented in this chapter? 
het ulate a list. of suggestions which would help teachers to 
7 eir own subjective evaluations of pupils more constructively. 
M erer the Education Index and find and report on somè 
dium! published in the last six months having to do with the use 
projective techniques by classroom teachers. 
fenfus dite would you consider to be more important for a boy 
E: etum 2 in social adjustment at school—a factual study 
fais enm i community or an interview which reveals how he 
is home and community? 
_7. Evaluate this statement: Persona 
Six years of life. 


lity is formed in the first 


SUGGESTED ADDITIONAL READINGS 


Classroom Teachers, 


Bernard, Harold W.: Mental Hygiene for 
Inc., 1952, pP. 297- 


E York: McGraw-Hill Book Company: 
e role of writing in the release of 
tensions and interpretation of personality, with art as an ap- 
Proach to understanding personality, and with play and drama 
Bi as classroom techniques in pupil understanding. 
Bieker, Helen: “Using Anecdotal Records to Know the Child," 
x Fostering Mental Health in Our Schools, 1950 Yearbook, As- 
oe for Supervision and Curriculum Development, Washing- 
on: National Education Association, 1950, pp. 184-202. 
This is a condensed account of the aims, techniques, and ad- 
vantages of the anecdotal record. It provides background ma- 
c terial which prepares one to experiment for himself. 
eel, Raymond B.: Personality, New York: McGraw-Hill Book 
ompany, Inc., 1950, chap. 4. i 
A scholarly description an 
for testing personality- The 
inherent in the problem of personalit 


Three chapters deal with th 


n of various techniques 


d evaluatio c 1 
discussion points UP the difficulties 


y assessment. 
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Kaplan, Louis, and Denis Baron: Mental Hygiene and Life, New 
York: Harper & Brothers, 1952, pp. 52-80. 
This chapter discusses the origin and meaning of personality. 
The uniqueness of personality, rather than the division into 
types, is described. 
Klopfer, Bruno, Mary D. Ainsworth, Walter G. Klopfer, and 
Robert R. Holt: Developments in the Rorschach Technique, 
Yonkers, N.Y.: World Book Company, 1954. 
This book contains a detailed description of the technique and 
theory of Rorschach tests. It will be of interest to the student 
who wishes to specialize in clinical testing. 
Olson, Willard C.: "Personality," in Walter S. Monroe (ed.), En- 
cyclopedia of Educational Research, rev. ed., 1950, pp. 806-817. 
The greater part of this article is devoted to a critical examina- 
tion of the uses and shortcomings of methods for appraising 
personality. An extensive bibliography for further study is in- 
cluded. 
Thompson, George G.: Child Psychology, Boston: Houghton Mif- 
flin Company, 1952, chap. 14. 
Approaches to the evaluation of personality are discussed in 
terms of the theoretical constituents of personality and the kind 
of development that seems to be culturally expedient. 


CHAPTER NINE 


Evaluating Classroom Social 


Relationships 


ith other 


quse, the child is brought into contact w : 
Seis ina social situation which influences his academic 
this en ments and his personal and social adjustment. One of 
devel portant tasks which face the child of school age is the 
sons ee of satisfying relationships with his peers. Ade- 
big de ationships minister to the [o à 

beri nce and the approval of his age mates. The child’s 
- Mes toward life and learning 1n the school situation may 
of th er favorably or unfavorably influenced by the nature 
ee e social climate of the classroom. School learning cannot 

isolated from the social setting in which it occurs. 


hild's need for social 


FACTORS RELATED TO SOCIAL ACCEPTANCE 
i ee of social acceptability among children of 
H Col age have pointed to a number of considerations which 
Es. of importance to the teacher. Children tend to select their 

lends from their neighborhood and classroom groups and 
cing the Selection of Associates," 


1 
ct V. Seagoe, “Factors Influen 
nal of Educational Research, 27:32-40, 1933. 
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on the basis of similarity in chronological and mental age.” 
Physical condition, proficiency in playground activities, and 
neuromuscular skill play a significant role in social accepta- 
bility during the school years." 

Social acceptance is also related to scholastic achievement. 
For example, “best-liked” children are typically superior to 
unpopular children in scholastic ratings and in reading 
achievement. It has been demonstrated, too, that children 
tend to choose as friends those classmates who are somewhat 
similar to themselves in mental age and scholastic achieve- 
ment.” Children who have been retarded in school are fre- 
quently among the unchosen individuals in the group and are 
likely to display problems in social and emotional adjustment.? 

Social acceptance in the classroom is related to the personal 
and social characteristics of the individual. Popular children 
are typically more self-confident and emotionally stable than 
unpopular children’ and evidence a greater degree of outgoing 
energy. 

Thus the social status of a child among his peers is related 
to developmental characteristics and environmental factors. 
The fact that a child's acceptance status tends to remain rela- 


*R. Pintner, G. Forlano, and H. Freeman, "Personality and Atti- 
tudinal Similarity among Classroom Friends," Journal of Applied 
Psychology, 21:48-65, 1937. 

*B. Grossman and J. Wrighter, “The Relation between Selection- 
Rejection and Intelligence, Social Status, and Personality among Sixth- 
grade Children,” Sociometry, 11:346—355, 1948. 

*M. C. Hardy, “Social Recognition at the Elementary School Age." 
Journal of Social Psychology, 8:365—384, 1937. 

"D. S. Belden, “A Study of the Nature of Social Structure" (un- 
published), Division of Research and Guidance, Los Angeles County 
Schools, 1942. 

"A. A. Sandin, “Social and Emotional Adjustments of Regularly 
Promoted and Nonpromoted Pupils" Child Development Mono- 
graphs, 1944, no. 32. 


*D. Baron, “Mental-health Characteristics and Classroom Social 
Status,” Education, 69:306-310, 1949. 
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|i penne from year to year? indicates that problems of 
niic s EA represent an area in which guidance is 
ed M A should be an important concern of teachers 
esie ot meer M relationships between social acceptability 
d an an personality factors indicate that questions 
Minime ix or nonpromotion, acceleration, reorganization 
SR groups, changes of school, neighborhood, or 

may represent crucial decisions from the point of 


vi i 
lew of adjustment and learning. 


STUDYING SOCIAL RELATIONSHIPS 


mo wa course of his everyday activities the teacher has 
Sid pportunities to observe children working and playing 
gus "s ; When such observations become systematized and 
inr. the data which they provide are likely to become 
Par valuable. Scientists have developed a number of 
inel di designed to systematize the study of social rela- 
is ur One of these techniques, the Societies method,? 
is on oe to facilitate the study of individuals in groups and 

ily applicable to the classroom situation. The method 


inv f : M 
Olves the selection of associates for group activities. In the 
e asked to choose seat- 


been developed. 
Es group situations, certain 
rejection develop among indivi 


patterns of attraction, neglect, 
duals. In classroom groups, 


s 
Ac M B. Bonney, “The Relative Stability of Social, Intellectual, and 
ee Status in Grades II to IV and the Interrelationships between 

se Various Forms of Growth,” Journal of Educational Psychology; 


34:88-102, 1943. 
- dm Moreno, Who Shall Surv 
isease Publishing Company, 


„ive? Washington: Nervous and Men- 


1934, pp. 12-14. 


146 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


for instance, some children become the focal points of attrac- 
tion and their company is eagerly sought by many members 
of the group. Other children may be overlooked when asso- 
ciates are selected, and still others may be actively rejected as 
companions. The social *climate" of a classroom is profoundly 
influenced by the pattern of interrelationships which prevails 
among members of the group. One classroom group may be 
drawn together by numerous attractions which extend through- 
out its membership; this situation facilitates united, coopera- 
tive effort. Another group may be comprised of mutually ex- 
clusive subgroups; in such classes the possibilities for coopera- 
tive group activities are minimized.'? The sociometric method 
enables the teacher to obtain information concerning the pat- 
tern of relationships which forms the social “climate” in which 
pupils live in his classroom. 


The Sociometric Question 


In the sociometric method, pupils are asked to choose the 
associates they would prefer for a specific situation. The ques- 
tion might be, *Which three of your classmates would you 
prefer to have as your best friends?" This is a general question 
which implies no forthcoming action. A more specific question 
ideally would imply subsequent action: *We have decided to 
have a puppet show. Which of your classmates would you 
prefer to work with in preparing the show?" The teacher 
should design sociometric questions in such a way as to elicit 
valid or real preferences. Choices are likely to be most valid 
when the situation is real and meaningful and when the pupils 
are assured that the choices will be acted upon. The following 
question, for example, meets these criteria: *You are seated 
now according to a plan which seemed convenient. You have 
now had a chance to become acquainted with each other and 
perhaps would like to be seated near someone of your own 


"Hilda Taba et al., Diagnosing Human-relations Needs, Washing- 
ton: American Council on Education, 1951, p. 71. 


EV. 
ALUATING CLASSROOM SOCIAL RELATIONSHIPS 147 


choice. Which (two, three, four) of your classmates would 
idm to have seated near you? You will be seated near at 
vg ^ the persons you choose." In this statement, the 
een n,t € purpose, and the number of preferences allowed 

i specified. Finally, assurance is given that the results 
will be utilized. 

The following principles will help the teacher in framing 
sociometric questions: 

1. Give pupils a good reason for lis 

2. Present your plans for utilizing the choices. 

3. Plan and word the directions carefully so that pupils 
Will understand clearly what is wanted. 

4. State the question in such a way that pupils fully under- 
stand it. 

The sociometric question should be forr 
the actual situation and the purposes of the teacher. However, 
the list of questions which follows may suggest some areas 
Which provide meaningful situations: 

1. Which of your classmates do you prefer to hav 
near you? 

2. Some of you are having 
boys or girls do you choose to help you? 

3. We are going to plan a field trip. Which boys or girls 
do you prefer to work with on the planning committees? 

4. We have planned a project in social studies. Which boys 
9r girls would you like to have as members of your group? 

5. We are going to select groups for games on the play- 
ground. Which of your classmates do you prefer as members 


9f your group? 

6. The other day we decided to hold a class picnic. Which 
boys or girls do you choose as members of the planning com- 
mittee? 

7. We plan to have a cl 
Éroups around tables for lunc 
You wish to have seated at your 


ting their preferences. 


mulated in terms of 


e seated 


difficulty with your work. Which 


ass party. We will be seated in 
h. Which of your classmates do 


table? 


148 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


8. A group of pupils is to plan a program for a parent- 
teacher meeting. Which of your classmates should represent 
our class on the planning committee? 

The following suggestions will help the teacher in the prep- 
aration and administration of sociometric questions: 

1. Utilize realistic and meaningful choice situations which 
bear a definite relationship to the activities of the group. 

2. Word the question in such a way that the pupils under- 
stand its purposes and significance. 

3. Have a few pupils prepare a list of the first and last 
names of the members of the group. The lettering should be 
large enough so that all the pupils can read the names. 

4. Allow sufficient time for pupils to record their choices. 

5. Have pupils list their choices on a small sheet of paper or 
a 3- by 5-inch card. Each pupil should sign his paper or card 
so that he may be identified. It helps to have a sample of the 
choice blank presented on the chalkboard. A suggested form 
for recording sociometric choices is presented in Figure 9. 

6. Indicate precisely the number of choices which each 
pupil is to make. The number of choices requested will vary 
with the sociometric question, the purposes of the teacher, and 
the practical problem of the amount of time available for tab- 
ulation and evaluation of the results. Certain authorities sug- 
gest three choices by each pupil as the most practical num- 
ber." Other investigators indicate that larger numbers of 
choices result in increased validity.'? The age of the pupils is 
a further consideration, since children in the primary grades 
typically choose fewer associates than children in the middle 
and upper grades. 

7. Explain the range of choice. Ordinarily choices are lim- 
ited to members of the classroom group exclusive of the 


*: Ibid., p. 76. 


^E. Eng and R. L. French, “The Determination of Sociometric 
Status,” Sociometry, 11:368—371, 1948. 
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vour nae. Levert y A. 


(first) (last) 


DATE. Mayl’, 1957. 


GRADE. — J 
seco 29. /4 — 


teacuer. Miss Smith, 


QUESTION: With whom would you like to work on our project 
in social studies? 


5 
CHOICES 
First Name. Last Initial. 
1s 
2. 
3. 
4. 
5 n eee. Ee 
| showing 


con Suggested form for recording sociometric choices, 
es of the choice blank. 
bes Cher. In certain situations the range of c 
Wider or more limited. The teacher should spec 
Not pupils who are absent are eligible for choice. gi 
Scoring and Tabulating the Results. In many instances it is 
not necessary to weigh or give score values to the choices. In 
Such cases the pupil's sociometric “score” is the total number 
Of choices he receives from members of the group. In other 


hoice may be 
ify whether or 
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cases the teacher may wish to consider the order of preference 
and assign arbitrary score values to choices in terms of the 
rank of the choice, as first choice, second choice, third choice. 
For example, in a situation where five choices are requested, 
a first choice might be assigned five points, a second choice 
four points, a third choice three points, and so on. Such scor- 
ing is arbitrary and does not necessarily reflect the actual 
value or intensity of the preference. Where a system of 
weighted scores is to be used, however, the directions to pupils 


should include the request that associates be listed in order 
of preference. 


The tabulation sheet should contain a complete record of 
the results of the sociometric test. It should include all the 
data needed to identify the group, the date of the test, the 
nature of the question, the number of choices requested, the 
method of scoring, and other information (such as the num- 
ber of pupils absent on the day of the test) which may be 
important in interpretation of the data in the record. The tab- 
ulation plan represented in Figure 10, which presents the 
complete record for a sample group, is one of a number of 
methods which meet these requirements: 

l. Essential data are indicated at the top of the sheet. 

2. The tabluation sheet is blocked off in cells, one row and 
one column for each pupil in the group. First names and 
initial of last names are listed across the top and down the side 
of the tabulation sheet. 

3. Girls and boys are listed separately in alphabetical order 
according to initial of last name. A vacant row and column 
separate the two lists. This type of listing helps in the analysis 
and interpretation of the results. 

4. The columns represent preferences indicated on the 
question blanks. Choices are entered in the cell where the 
column under the name of the pupil chosen is intersected by 
the row opposite the name of the chooser. For example, Bev- 


Isl 


SOCIOMETRIC TABULATION SHEET Date of Test, 4/15/57 
Question. With whom would you like to work on our project in Social Studies? School. P.S. 14. (city or town) 
No. of choices. 5 


Grade. Third 
Scoring. first choice-5 points; second-4 pts.; third-3 pts.; fourth-2 pts.; Teacher. Miss Smith. 
Choices Received. ——> 


K Ai. 
OAS Xs AS 7 X, S 
SOS EE CANO! VUE ED 
[ 


| | pepe 
- joz] mi 


ji 4 i mie m 


Choices By. 
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7. Sharon L. 
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11. Carol T. 
12. Patricia W. (absent) 
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23. Dennis T. 
24, Paul V. 
mmy Y. 
A. Total Score 
B. 1. First choices 
"2. Second "' 
3.Third " 
4.Fouth " 
5.Fifth " 
C. Total Choices received 
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Fic. 10. Sociometric tabulation sheet. 
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erly A. (first column) is chosen as an associate by June B. 
The figure 5 under Beverly and opposite June indicates that 
this is a first choice. 

5. To facilitate tabulation of choices, the choice blanks 
(Figure 9) can be arranged in the order in which choosers' 
names appear on the tabulation sheet. Each choice can then 
be listed along the rows under the name of the pupil chosen. 
The score value of each choice is indicated if score values are 
used. 

6. The sociometric “score” of each pupil is derived by sum- 
ming the columns, as indicated opposite 4 in Figure 10. For 
example, Beverly's sociometric score is 10, the sum of the 
score values in the column under her name. 

7. 1f the teacher wishes, he may also indicate the number 
of choices of each rank and the total number of choices re- 
ceived by each pupil. These figures are shown opposite B and 
C in Figure 10. 

The tabulation sheet presents a summary statement of the 
results of the test, indicating the choice status of the pupils. 
In our example, Jimmy Y,, with 41 points, leads the group in 
sociometric score. The leading girl is June B., with 29 points. 
No pupil is unchosen, but the lowest scores are those of 
Dolores K. (3 points) and Jackie S. (4 points) among the 
girls and Bob C. (4 points) among the boys. 

The tabulation sheet may be easily preserved for future 
reference and comparisons, and it is a basic work sheet for the 
teacher who plans to further analyze the results of testing. 
From the tabulation Sheet, sociograms may be developed as 
a means of further clarifying relationships within the group. 


The Sociogram 


The sociogram is designed to portray graphically the choice 
relationships which are recorded on the tabulation sheet. The 
method presented here makes use of the target diagram as à 
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SOCIOGRAM 
Question: with whom would you like fo work on Date of test: 4/15/57 
" our project in Social Studies ? School: RS. /4 (City or Town) 
o. of choices » Grade: Third 
represented: 2 Teacher: Miss Smith 
Girls: Q First choice —> 
Second choice ——> 


Boys: C] Mutual choice ==> 


Pro, t1. Sociogram representing interrelationships among pupils in 
ponses to the question, With 


the third grade as indicated by their res 1 K xe ee 
Whom would you like to work on our project in social studies?" First, 
Second, and mutual choices are represented. 

L: P. $. 14 (City or town) 


is PIL: Dolores K. Grape: 3 SCHOO 

X: F Ace: 8-8 Dare: 4/15/57 e 

Question: “With whom would you like to work on our project in 
; seen 

ocn in terms of rank of choice) 


Cuorces: 5. Score vALUES: 5, 4 $ 2 1( 
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means of representing sociometric data in graphic form. This 
type of sociogram is based on a series of concentric circles 
bisected vertically, as shown in Figure 11. The small circles 
to the left of the center line represent girls and the rectangles 
to the right of the line represent boys. Pupils who rank in the 
highest 25 per cent of the group are located, by initial, in the 
inner circle. In the outer circle are those pupils who comprise 
the lowest 25 per cent in sociometric score. The location of 
pupils within the various circles roughly approximates their 
sociometric rank. 

The following suggestions will help the teacher prepare a 
sociogram of this type: 

1. Use a large sheet of paper for a trial form. 

2. Draw the concentric circles and bisecting line. 

3. Fill in the necessary identifying data (i.e., grade, school, 
date, teacher, question, number of choices to be represented, 
score values if any, meaning of the symbols employed). 

4. Within the innermost circle indicate the boys and girls 
who, according to score, rank in the upper 25 per cent of the 
group. Disperse these symbols within the circle. 

5. Indicate the relative positions of pupils within the sec- 
ond circle. Distribute the symbols throughout the available 
space. 

6. Locate the pupils with the lowest sociometric scores 
within the outer circle. These symbols should ideally be lo- 
cated so that lines can be drawn directly to the symbols in the 
central circle. 

7. Draw lines representing the direction of the first choices 
of pupils in the group, using arrow tips to indicate direction of 
choice, as in Figure 11. The number of lines can be reduced 
by using a single line with double arrow tip and bar to indi- 
cate mutual choices. 

8. Indicate second choices similarly by using a dotted of 


colored line. Other levels of choice may be indicated if the 
teacher wishes. 
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9. Study the trial sociogram for ways to relocate symbols 
ity of the diagram and estimate the 


in order to improve the clar 
ly depicted for the 


number of choices which can be satisfactori 
group. 
Sociograms are of great assista 
ing relationships among pupils. 
some of the advantages they offer: 

1. Relative sociometric ranks are revealed 

2. Directions of choice and extent of mutu 
are indicated. 

3. Heavy concentrations of choice are revealed. 

4. Choices which run across sex lines are clearly indicated. 

5. The teacher can readily identify individuals and study 
their choice relationships with others in the group. 

6. Possibilities for grouping pupils in 2 psychologically 
meaningful way are portrayed. 

7. The popular individuals, 
and subgroups are graphically depicted. 

For example, the following noteworthy features are D 
Vealed concerning the group studied in Figure 11: 

1. There are more girls than boys in the circle representing 


high choice status. 

2. Choices of boys are heavily concentrated on Jimmy Y. 
Choices of girls show greater dispersion. 

3. Boys in this group frequently select girls a 
companions, but girls seldom select boys. 

4. By comparison with the total number of choice 


Picted, the number of mutual choices is relatively small. 
5. There are indications of some rather closely knit groups, 


Particularly in terms of first choices. "T 
6. No pupil, with the exception of D. K., a girl, fails to re- 


Ceive either a first or second choice. This would seem to indi- 
hroughout the group. A fur- 


Cate relatively good relationships t Per , 
ther indication of a good dispersion of attractions 15 seen in 
the chaining of choices, as with P. Va E.R. E: E. and T Y. 


nce to the teacher in study- 
The following list suggests 


graphically. 
ality of choice 


the unchosen, mutual pairs, 


s working 


s de- 
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A further method of graphic representation is presented in 
Figure 12, which depicts the choice relationships of an indi- 
vidual pupil, Dolores K. The direction of choice is again rep- 
resented by arrow tips, and the score values of choices are 
indicated near the inner circle. For example, Dolores gives 
her first choice (5 points) to D. B., one of the boys of the 


JD 


BA Dolores K. 


3 


(Abs) 

Fic. 12. Diagrammatic Tepresentation of the sociometric relationships 
of Dolores K. 

group. She gives her second choice (4 points) to I. N., and 
is the third choice (3 points) of I. N. The mutuality of choice 
is indicated by double arrow tips and bar. This type of dia- 
gram helps to clarify the choice relationships of individuals 
whom the teacher may wish to study further and is useful in 
Working out committee and other classroom groupings. 


ANALYZING AND INTERPRETING RESULTS 


Some of the interrelationships within a group are readily 
discerned in the results of sociometric testing. Still other sets 
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of relationships are not so clearly defined but may be clarified 
by various techniques of representation and analysis. The so- 
ciogram and the diagrammatic representation of the choice 
relationships of individuals offer possibilities for clarification 
and analysis of the data. 

The teacher will undoubtedly study the results by means of 
questions which apply to his unique situation. The following 
questions may serve as leads in developing this type of 
analysis. 

1. Do the choices center upon a few pupils, or are they 
relatively well dispersed. In our example, almost 20 per cent 
of the choices received by boys of the group are centered 
around Jimmy Y. The teacher may wish to consider possible 
reasons for the popularity of individuals with high choice 
status. Such pupils may play important roles in determining 
classroom morale and leading the activities of the group. 

2. Are there pupils who receive no choices? Typically 
there are “isolates” in every class. The proportion of unchosen 
Children is ordinarily highest in the kindergarten and the first 
two grades. In the third-grade group We have been discussing, 
there is no child who is unchosen and only one pupil, Do- 
lores K., who fails to receive either a first or second choice. 
In a two-choice situation, Dolores would be considered an 
isolate, Observation of unchosen children may reveal be- 
havioral or other factors which interfere with their acceptance 
by their peers and may suggest ways in which teachers can 
help these pupils establish themselves as accepted members of 


the group. l 

3. Do choices cross sex lines, Or is there a rather definite 
Cleavage between girls and boys? During the first three years 
9f school there is generally less cleavage between the sexe* 
than in the succeeding three years. In the case of our third- 
grade group, cleavage along sex lines is especially marked in 


" Adapted in part from Taba et al., op. cit, PP- 83-86. 
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the case of the girls. This pattern is fairly common during the 
early school years and is an aspect of boy-girl relationships 
which is an important consideration in the grouping of pupils 
during this period. . 

4. Is there a satisfactory degree of mutuality in the choice 
patterns? Mutuality of choice is ordinarily likely to indicate 
satisfying relationships. However, in some instances pupils 
may pair off to form small, tightly closed groups. In our ex- 
ample, a considerable degree of mutuality is evidenced when 
all choices are considered. Furthermore, considerable *chain- 
ing" is evidenced, which seems to be indicative of a series of 
attractions which run through and knit together the groups of 
boys and girls. For instance, although a triangular chain rela- 
tionship of first choices links S. L., L. M., and J. D., the pat- 
tern of second choices indicates that members of this group 
have good relationships with others of their classmates. 

The above suggestions offer some basic possibilities for 
study of the results of Sociometric testing. The teacher will un- 
doubtedly note other relationships which are of special inter- 
est to him; for example, we may look for the choice patterns 
that he expected to find. It is also profitable to look for the 
unexpected; in fact, a most common reaction of teachers 
using the sociometric design for the first time is surprise at 
seeing relationships which they had not previously realized. 
Frequently, for example, the teacher may find that a pupil is 
more or less popular than he had expected; or he may find 
lines of attraction taking unexpected directions or intensities. 


Such events merit Special study and may increase the teach- 
ers understanding of his pupils. 


Utilizing Results 


The feelings and attitudes, attractions and repulsions which 
pervade the group inevitably influence the learning activities 
of the classroom. Sociometric data may enable the teacher to 
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develop more satisfying, meaningful, and effective learning 
situations, since they reveal who the preferred leaders and as- 
sociates of the pupil group are. When accompanied by first 
hand observation of the pupil leaders, a knowledge of the 
leadership roles of pupils is helpful in developing morale, in 
the management of the classroom, and in the development of 
Psychologically meaningful pupil groups. 

Recognition and observation of pupils wh 
choices or none at all may alert the teacher to group or indi- 
Vidual problems. The teacher who has identified pupils of this 
type and who is aware of their preferences in the group is fre- 
quently able to help such pupils attract the attention 
Spect of others. This can sometimes be accomplished t 
Judicious grouping or through capitalizing upon a special skill 
Or hobby to bring a pupil into the group. 

In classes in which pupils are organized into almost mu- 
tually exclusive groups, sociometric-test data may indicate 
linkages by means of which the teacher can encourage more 
Expansive patterns of social interaction. The pupils’ choice pat 
terns also suggest possibilities for improved group activities 
and the formation of more harmonious working groups. 

The first step in putting sociometric data to work is to act 
upon the results in terms of the purpose for which the test was 
given. If the test question referred to seating arrangement, the 
class should be reseated in a pattern closely approximating 
the choice patterns revealed by the test. Ordinarily, some 
compromises will be necessary. If the question referred to the 
formation of working groups, such groups should be organ 
ized on the basis of the findings. Again. ingenuity will be re- 
Quired in working out acceptable compromises. The following 
Suggestions may help the teacher utilize test results: 

1. If possible, give the unchosen pupil his first choice. 

2. When choices are mutual, give the pupil his highest re- 


SiProcated choice. 


o receive few 


and re- 
hrough 
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3. If the pupil has chosen only individuals who have pet 
chosen him in return, give him his first choice as an associate 
if there is a possibility that he will be accepted by this indi- 
vidual. 

4. Do not place any pupil with a pupil who may actively 
reject him. . 

5. In forming groups on the basis of the results of Socio" 
metric tests, provide each pupil with an associate of his 
choice. If possible, organize groups in such a way that their 
members are linked together by the choice patterns. 

6. Provide for leadership which will be recognized and 
accepted by group members. 

The sociometric test is a tool which provides the teacher 
with information regarding the interrelationships of individ- 
uals in the group. Like other test results, this information is of 
greatest value when it is used in conjunction with data ob- 
tained from other sources. It gives impetus to the teacher's ob- 
servations of the social interaction of pupils, and it may form 
the basis for the development of meaningful and satisfying So- 
cial and learning experiences in the classroom. 


SUMMARY 


The classroom is a social situation which has a significant 
impact upon the learning activities and social development of 
pupils. The sociometric test provides a means of studying the 
social interactions of persons in groups. The individual taking 
the test is asked to select one or a number of companions for 
a situation in which social relationships are important. 
Choices are tabulated; graphic representations may be devel- 
oped for the group or for the individual, and the results may 
be utilized as a basis for grouping pupils for the specified situ- 
ation or activity. 


The utilization of sociometric devices in the classroom pro” 
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a: ps teacher with information regarding (1) the accept- 
SS us of pupils in the group, (2) the lines of attraction 
anie = and (3) cleavages within the group. This in- 
iy, ty bec be of value in grouping pupils for work or 
a Ins u ying the problems of individual pupils, in devel- 

ping pupil leadership in classroom activities, and in the im- 
alee of relationships among members of the group. So- 
iometric data help the teacher create the appropriate social 


Setting for learning. 


STUDY AND DISCUSSION EXERCISES 


ud — some reasons why teacher and pupil choices of 
iplis c assroom activities sometimes differ. What values do 
2 i pupil selection of classroom leaders and associates? 

. List some classroom situations which might form the basis 


for sociometric questions. 
PA yi values might th 
z wt which refer to extra a 
"undi at particular advantages might a teacher who is new toa 
might tee derive from sociometrie data? What difficulties 

E Ww S a teacher find in the interpretation of the data? 

Pune. at advantages are there in keeping records of the results 
essive sociometric tests? 
i If a classroom is available to you, arrange to administer a 
metric test. Develop a sociogram on the basis of the results. 
7. What methods might the teacher use to find an explanation 


of " : ; 
the results of sociometric testing? 


e teacher derive from the use of socio- 
classroom situations? 
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This chapter is a readable and authoritative account of the 
nature, purposes, and values of the sociometric method as it ap- 
plies to the classroom situation. . 
: Sociometry in Group Relations, Washington: American 
Council on Education, 1948. 
This study is devoted to a description of the sociometric method 
and enlarges upon its uses and applications in the classroom 
situation. 
Moreno, J. L.: Who Shall Survive? Washington: Nervous and 
Mental Disease Publishing Company, 1934. 
This text includes a description and an account of the rationale 
of the original experimentation which introduced the sociometric 
method. 


Northway, M. L.: A Primer of Sociometry, Toronto: University 
of Toronto Press, 1952. . 
This booklet presents a detailed account of the sociometric 
test and of methods of organizing and presenting the results. 
A discussion of interpretations and uses is included. 
Taba, H., E. H. Brady, J. T. Robinson, and W. E. Vickery: 
Diagnosing Human-relations Needs, Washington: American Coun- 
cil on Education, 1951. 
Chapter V presents an excellent account of the procedures in- 
volved in the use of the sociometric method. 
Thomas, R. M.: Judging Student Progress, New York: Longmans, 
Green & Co., Inc., 1954. 
Chapter 9 describes the sociometric technique as a basis for the 
evaluation of social relationships in the classroom. 


CHAPTER TEN 


Studying Interests and Attitudes 


r clues to the understanding of 
both are closely related 
ential aspects of motiva- 


Interests and attitudes offe 
the behavior of the individual, since 
to emotional life. They determine ess 
tion and can facilitate or interfere with the efficiency of learn- 
ing in the classroom, for a learning program geared to the in- 
terests of the pupils becomes vital and meaningful to them. 
Favorable attitudes toward the school, the learning task, the 
teacher, and the group facilitate the pupil’s attainment of 
Worthwhile educational goals. Adverse attitudes, on the other 
hand, are likely to result in discord, apathy, rebellion, tru- 
ancy, and other behavior that interferes with the attainment 
of desirable educational objectives. 

Interests and attitudes are learned. Individuals develop at- 
tractions or aversions as à result of environmental opportuni- 
Mies, personal needs, and experiences. For example, the indi- 
vidual may develop a favorable attitude toward reading or an 
aversion to reading in accordance with the opportunities, sat- 


isfactions, failures, or frustrations with which reading be- 
If the pupil has developed 


c : A ga ; 
Omes associated in his experience 
Positive attitudes, his energy can be readily directed toward 
Teading experiences. If he has developed negative attitudes, he 
is li : SE DOR 
5 likely to avoid reading situations. 
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Parents and other individuals in the child's immediate en- 
vironment influence the development of his interests and at- 
titudes. Hence, attitudes toward school and school experi- 
ences, racial and religious groups, teachers, and other chil- 
dren are frequently created before the child reaches school 
age. The teacher's task of knowing the pupil and working 
with him is facilitated by adequate understanding of his at- 
titudinal and interest patterns. 


METHODS OF STUDYING INTERESTS 


The investigation of pupil interests may be carried out by 
means of observation techniques, interviews, direct questions. 
a check list, or an interest inventory. Studying interests by ob- 
servation offers certain advantages over the use of interviews 
or inventories, since it permits the teacher to study his pupils 
under conditions which are natural rather than artificial. The 
classroom affords many and varied opportunities to observe 
behavior; the method of observation can be adapted to many 
Situations, and records can be kept over long periods of time. 
Planned and purposeful observation is likely to arouse the 
teacher's interest in and increase his understanding of pupil 
behavior. 

However, the teacher must be aware of the limitations of 
observational methods. If the Observations are carried on with 
reference to too many situations or too many pupils at one 
time, they may become extremely time-consuming. Probably 
it is wisest to begin by keeping relatively complete records on 
a few pupils who present motivational problems. When this 
procedure is used, the observations may well be used to sup- 
plement data derived from other, less time-consuming meth- 
ods of studying interests. 

Further limitations of observation as a method of studying 
the child are its subjectivity and the need for skill on the part 
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of the teacher. The attitudes of the teacher, the range of situ- 
ations in which he observes behavior, the significance he at- 
taches to specific incidents, and the degree of objectivity he 
attains in recording behavior all influence the validity of the 
observations. 


Interviews offer possibilities for the study of interests, since 


pupils are ordinarily eager to discuss hobbies and other activ- 
ities which are of interest to them, when they find an adult 
who appears to be interested, understanding, and willing to 
listen attentively. Pupil interests form a good basis for begin- 
ning an interview which may actually have some purpose 
other than to investigate interests. The teacher may acquire 
information about feelings and attitudes as he encourages the 
pupil to talk about his after-school activities, his favorite 
games or play activities, his hobbies, trips he has taken, his 
Most interesting experiences, the books he likes, his favorite 
radio or television programs, movies he has enjoyed, and so 
on. Such interviews ordinarily prove fruitful in developing 
friendly relationships and increasing understanding of pupil 
feelings and attitudes as well as locating interests. Interviews 
of this type help the teacher to plan experiences for pupils 


which will utilize their interests advantageously. 
In order to save time and gather data from the entire class- 


room group at one time, the teacher may wish to ask pupils to 
describe their preferred activities or to name their favorite 
School subjects, games, hobbies, reading material, or dues 
tional activities. Written reports of this type provide the 
teacher with a wealth of information which may be utilized to 
good advantage in the classroom. 


The more formal type of inter i ; 
help the teacher to know his pupils better. The questionnaire 


has the advantage of providing an economical method of 
gathering the desired data, but the method is subject to cer- 
tain limitations. For instance; the questions may or may not 


est questionnaire may also 
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be meaningful to the pupil in terms of his experience and in- 
formation; and he may or may not be willing to cooperate 
fully in indicating his real preferences. The questionnaire, 
however, may include a wide range of statements related to 
interest and hence may represent a broad coverage of inter- 
est possibilities. The teacher may develop questionnaires to 
serve his specific purposes, or he may use a suitable published 
questionnaire. Interest inventories have been developed to fa- 
cilitate the study of preferences among vocations, academic 
areas, extracurricular and recreational activities, and personal 
and social activities. The accompanying examples have been 
selected from a few published inventories to indicate some of 
the methods and instruments that have been devised for the 
study of interests. 

A. relatively informal listing of Seventy-four interests and 
activities accompanies the California Test of Personality.’ 


The directions and a few items will serve to indicate the gen- 
eral nature of the inventory. 


Interests and Activities. First look at 


each thing in this test. 
Make a circle around the “L” 


for each thing that you like or would 
like very much to do. Then make a circle around the “D” for 
things you really do. 


l. L D Play the radio 
2. L D Read stories 
3. L D Goto the movies 
4. L D Read comic strips 
5. L D Work problems 
6. L D Study history 
* * $ 
70. L D Go to parties 
71. L D Go to dances 
72. L D Bean officer of a club 


*W. W. Clark, E. W. Tiegs, and L. P. Thorpe, California Test of 
Personality, Los Angeles: California Test Bureau, 1942, 
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73. L D Bea class officer 

74. L D Go camping 
The items of the inventory are arranged according to the 
amount of activity involved, proceeding from the more indi- 
Vidual and passive interests to those which are predominantly 
Social and active in nature. 

_Very few interest-test materials for elementary school pu- 
Pils have been published. However, among the few published 
inventories of children’s interests is one entitled What I Like to 
Do,? which is designed for pupils in grades four through seven. 
The authors suggest that the inventory may be useful as an aid 
in (1) curriculum development, (2) selection of instructional 
materials, (3) parent conferences, (4) understanding of in- 
dividual differences among pupils. (5) planning for pupils in 
instructional, recreational, and educational areas, and (6) 
Pupil guidance. The interest areas covered are: art, music, 
Social studies, active play, quiet play, manual arts, home arts, 
and science. Interest profiles provide percentile norms for 
boys and girls from grades four through six. Pupil responses 
are indicated by a cross in answer boxes under No, ?, or Yes 
for each item. The following are illustrative sample items: 


Would You Like to . . - 
No ? Yes 


1. Eat ice cream 

2. Play “Crack the Whip” 

3. Walk in the woods 

4. Sleep in a tent 

The Strong Vocational Interest Blank* is an example of a 
carefully standardized interest inventory. Separate forms are 

? Louis P. Thorpe Charles E. Meyers, and Marcella R. Sea, What 
I Like to Do: An Inventory of Children’s Interests, Chicago: Science 


Research Associates, Inc., 1954. 
? Thorpe et al., Examiner Manual for What I Like to Do: An In- 


ventory of Children's Interests, p. 3. 
^E. K. Strong, JT. Vocational Interest Blank, Stanford, Calif.: 


Stanford University Press, 1938. 
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available for men and women. The individual is asked to in- 
dicate whether he likes, dislikes, or is indifferent to each of a 
list of Occupations, amusements, school subjects, activities, 
and groups of persons. Included also are scales in which ac- 
tivities are ranked in order of preference and scales in which 
a comparison of interest between two items is requested. The 
inventory includes a self-rating of abilities and characteristics. 
The individual checks p «o op emo (like, indifferent, or 
dislike) for (1) occupations such as advertiser, architect, 
army officer, artist; (2) amusements such as golf, fishing, ten- 
m (3) school subjects such as algebra, agriculture, arith- 
metic, art; (4) activities such as repairing a clock, making à 
radio set, interviewing clients; (5) people such as progressive 
people, conservative people, energetic people, people who 
borrow things. Scores on the blanks indicate whether or not 
the subject has patterns of interests similar to those of persons 
who are engaged in given occupations, The Strong inventory 
has been found to be useful as one source of data in counsel- 
ing with high school and college students relative to academic 
and vocational choices. 


The Occupational Interest Inventory* represents a some- 


what different approach to the Study of occupational prefer- 


ences. The individual is asked to indicate his preferences 
among paired activities such as the following:* 


1 

A. Deliver groceries or meat to homes. 

D. Wrap articles in the shipping department of a store. 
8 

B. Raise pedigreed dogs, horses, or other animals. 

C. Operate lathes, drill presses, or planes, 


"E. A. Lee and L. P. Thorpe, Occupational Interest In 
Angeles: California Test Bureau, 1944, 
° Ibid., Intermediate Inventory, Form A. 


iventory, Los 
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D. Direct the sales policies for a large store or firm. 
E. Write stories or articles for important magazines. 


Scores on the Occupational Interest Inventory are related 
to six fields of interest: personal-social, natural, mechanical, 
business, the arts, the sciences. Scores are also available for 
types of interest such as verbal, manipulative, and computa- 
tional. The last section of the inventory is designed to identify 
the level of the individual's interest which may be associated 
With tasks at the routine level, the skilled levels, or a level 
Which requires expertness, skill judgment, and perhaps 
Supervisory or administrative responsibilities. The test ap- 
Pears in forms adapted to the upper elementary or junior 
high school age and to the high school, college, and adult 
levels, 

The Kuder Preference Record’ appears in two forms, vo- 
Cational and personal, which differ in emphasis and purpose. 
The vocational inventory provides a profile of scores in ten 
interest categories: outdoor, mechanical, computational, sci- 
entific, persuasive, artistic, literary, musical, social-service, 
and clerical. The personal form of the Preference Record is 
Similar in format to the vocational form. It provides scores 
for different types of personal and social activities such as 
Working with ideas, being active in groups, avoiding conflicts, 
directing or influencing others, being in familiar and stable 
Situations. 

The Kuder inventories utilize a forced choice technique in 
which the individual checks the best and least liked of three 
Possibilities presented in each item. For example, the indi- 
vidual indicates the most and least preferred of the following:* 


'G. F. Kuder, Kuder Preference Record, Chicago: Science Re- 


Search Associates, Inc., 1948. 
* Kuder Preference Record, Personal Form AH, Chicago: Science 
Research Associates, Inc., 1948. 
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a. Visit an art gallery 
b. Browse in a library 
C. Visit a museum 


An inventory of a somewhat different type is the Dunlap 
Academic Preference Blank? This is a check list designed 
for use with pupils from grades 6 to 9. It consists of ninety 
words and phrases Tepresentative of eight academic areas of 
elementary schoolwork. Pupil responses indicate liking, dis- 
like, indifference, or absence of familiarity with the various 
areas. 

Interest-test scores have generally been found to possess à 
relatively high degree of reliability. Administration of the tests 
to seventeen-year-old students, to college students, and to 
adults has demonstrated that the scores have a considerable 
degree of stability.” However, the interest scores of high 
school students are not so stable as those of older individ- 
uals." 

The constructive use of interest inventories requires an ap- 
preciation of the limitations of the instruments. The teacher 
should bear in mind the following limitations: (1) Answers 
depend on the individual's present Status. Since interests grow 
out of experience, it is possible that future interests may de- 
velop in other directions. It is entirely possible that success 
in some activity which the Pupil is required to pursue may 
engender an interest; it is also possible that such required 
participation, especially if the Student is not successful in the 

* J. W. Dunlap, Dunlap Academic Pr, 
World Book Company, 1940. 

"E. K. Strong, Vocational Interests o 
ford, Calif.: Stanford University Press, 1 
and W. C. McCall, “Some Data on the 
Educational and Psychological Measurement, 1:253-268 

"L. Canning, K. Van F. Taylor, and H. D 


of Vocational Interests of High School Boys," 
Psychology, 32:481—494; 1941. 


eference Blank, Yonkers, N.Y.: 


, 1941. 
- Carter, “Permanence 
Journal of Educational 
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activity, may inhibit the development of interest. (2) Pub- 
lished inventories do not necessarily include the whole range 
of possible interests. The score indicates simply that of all the 
interests represented on the inventory the subject is most in- 
terested in a given area, not that this area is necessarily his 
greatest interest. If another area had been represented, his 
highest score might be different. (3) The tests do not indicate 
potentiality. If a person has not yet engaged in a given ac- 
tivity, his responses simply indicate that at present he has not 
become interested. There is no indication that familiarity will 
Not generate interest. 

The interest inventory does, however, provide an effective 
Means of gathering data within short periods of time and 
serves as a tool which may be helpful to the teacher in a 
Variety of ways. At the secondary school level, interest-test 
Tesults provide useful data in educational and vocational 
guidance, where test results are best used in conjunction with 
interviews designed to assist the student to reach suitable de- 
Cisions. For guidance purposes the results of interest tests 
Should be used in conjunction with other information in reach- 
ing a decision. Basing academic and vocational advice solely 
On the results of questionnaires is hazardous. As a starting 
Point for an interview, however, these instruments are com- 
mendable, and the results of such tests are useful at the ele- 
mentary and secondary school levels in curriculum and in- 
Structional planning and in working with pupils who present 
behavioral or motivational problems. 


METHODS OF STUDYING ATTITUDES 


Attitudes are predispositions or tendencies to react in cer- 
tain characteristic Ways toward objects, creatures, individuals, 
institutions, races, religions, or practices. Attitudes may be 
studied by means of observation, interviews, ratings, and 
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various types of attitude and opinion scales. The teacher will 
find the study of pupil attitudes rewarding because it yields in- 
creased understanding of pupils and assistance in the planning 
and conduct of the instructional program and the evaluation 
of educational outcomes in terms of program objectives. 
Observation. Observational methods may be utilized as a 
means of gathering behavioral data from which pupil attitudes 
may be inferred. However, the method is subject to definite 
limitations. Personal attitudes and biases are likely to influence 
teachers' interpretations of behavior. For this reason it is ad- 
visable to record observed behavior as accurately and objec- 
tively as possible over a period of time before attempting 
interpretations. Since situational factors influence behavior, 
the record should include: (1) a reference to the specific situa- 
tion, (2) a description of the circumstances associated with 
the behavior, and (3) a factual statement of the behavior ob- 
served. Over a period of time the teacher may gather a series 
of records from which valid inferences regarding behavior and 
attitudes may be drawn. The following points are fundamental 


to the development of adequate observational (or anecdotal) 
records: 


1. Note the setting in which the behavior occurred, e.g. 
the classroom, the playground, the halls. 

2. Record the activity in progress, e.g., 
curricular activity, 
classes. 

3. Note special circumstances, e.g., the individuals in- 
volved, prior events which may have been influential, 
plans or directions, if any, 
time. 

4. Describe the behavior concisely and factually without 
interpretative terms such as “bad,” “mean,” * 

5. Sample behavior over a period of time, 


the class, extra- 
special program, or period between 


which were Operating at the 


good.” 
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6. Interpret cautiously on the basis of the objective data 
which have been recorded. The behavior description 
and the interpretation are not combined. 


These requirements suggest that it is probably best to begin 
by selecting one or two pupils for intensive study rather than 
attempting to record the behavior of a considerable number. 

Attitude Scales. A number of attitude scales have been de- 
veloped, using, for the most part, one of two basic methods. 
One of these approaches, devised by Thurstone, involves the 
Placement of statements upon a continuous scale from ex- 
tremely favorable to extremely unfavorable. Each item or step 
on the scale is assigned a carefully developed weighted-score 
value. The subject indicates the statements with which he 
agrees and disagrees, and a score is derived. Representative 
statements from the Thurstone scale for measuring attitudes 


toward communism are:'* 


A. Communism is the solution to our present economic prob- 


lems (9.1). 
B. Both the evils and benefits of communism are greatly ex- 


aggerated (5.4). 
C. Police are justified in shooting down Communists (0.3). 


Statement A presents a view highly favorable to communism. 
Statement C represents extreme dislike, whereas statement B 
is considered to reflect a relatively neutral attitude. The 
median scale value of the statements checked by the subject 
determine his attitude score on the scale. 

Utilizing the technique outlined above, Thurstone and 
others have devised scales for the measurement of attitudes 
toward war, the Negro, the Constitution, the law, freedom of 
Speech, labor unions, the treatment of criminals, and so on. 

"L. L. Thurstone, “Attitude toward Communism,” Scale No. 6, 


Form A, Chicago: University of Chicago Press. (Copyright, 1931, by 
the University of Chicago Press.) 
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Utilizing a similar technique, Remmers and others" ae 
developed general scales designed to measure attitudes tow r^ 
any person, group, institution, or practice. Excerpts from im 
Scale for Measuring Attitude toward Any Institution, 


ill indicate 
veloped by Ida B. Kelly and edited by Remmers, will indic 
the nature of these Scales: 


Is perfect in every way. 

Represents the best thought in modern life. 

- Is a strong influence for right living. 

. Is valuable in creating ideals. 

. Aids the individual in wise use of leisure time. 


Ov I9 ta — 


ji 
1 


2 r à i he 
The subject is asked to check each statement with which to 
agrees. The results of the Remmers scales are comparable 
those of the more specific scales of Thurstone.'* 


Among the many scales developed by Remmers and his as- 
Sociates are scales which indicate attitudes toward: 

1. Any disciplinary procedure ( V. R. Clause). 

2. Any elementary teacher (M. Amatora). 

3. Any practice (H. W. Bues). 

4. Any school subject (E. B. Silance). 

5. Any proposed social action (D. M. Thomas). 

6. Any teacher (L. B. Hoshaw), 

7. Any vocation (H. E. Miller). 

Another procedure for the 


measurement of attitudes has 
been proposed by Likert. 


In the Likert scales, each statement 
able or an unfavorable attitude. 
“H. H. Remmers and N. L. Ga 


ge, Educational Measurement and 
Evaluation, rev. ed., New York: Harper & Brothers, 1955, pp. 387- 
389. 


“H. H. Remmers, “Generalized Attitude Scales: Studies in Social- 
psychological Measurements,” in Studies in Higher Education, no. 26. 
Lafayette, Ind.: Purdue University, 1934, pp. 7-17, j 

"R. Likert, “A Technique for the Measurement of Attitudes, 
Archives of Psychology, 22 (140), 1932, 
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Strength of reaction toward each item is indicated along a 
scale running from strongly agree to strongly disagree. Favor- 
able attitudes are reflected in high scores and unfavorable at- 
titudes in low scores. Each item is carefully selected and 
tested. The procedures used in developing the Likert scales 
are not so time-consuming as those required for the Thurstone 
Scales, yet the Likert scales appear to be equally reliable.** 
The Likert method requires subjects to respond to all items 
of the scale and has some advantages in terms of possibilities 
of analysis of the results. 

The Scale of Social Distance developed by Bogardus'* is 
an instrument designed to indicate attitudes toward persons of 
various nationalities and races. Seven degrees of closeness are 
represented in the statements of the subject concerning his 
willingness to admit members of a national or racial group to 
(1) close kinship by marriage, (2) his club, (3) his street as 
neighbors, (4) the same occupation as himself, or (5) citi- 
zenship in his country; (6) as visitors only to his country; or 
(7) to exclude them from his country. Although the scale was 
designed for the study of attitudes toward racial and national 
£roups, the method is readily adaptable to the study of atti- 
tudes toward members of a variety of religious, social, politi- 
cal, and vocational groups. 

In a study of the development of attitudes toward the 
Negro, Horowitz*® used pictures of Negro and white boys. 
Pupils from kindergarten through eighth grade were first asked 
to rank the pictures in order of preference. Next they were 
asked to use the pictures as a basis for the selection of com- 


" R. Likert and others, “A Simple and Reliable Method of Scoring 
the Thurstone Attitude Scales," Journal of Social Psychology, 5:228— 
238, 1934. 

" E, S. Bogardus, *Measuring Social Distance," Journal of Applied 
Sociology, 9:299-308, 1925. 

"E. L. Horowitz "The Development of Attitude toward the 


Negro," Archives of Psychology, 28 (194), 1936. 
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ES mple, 
panions for various situations and activities. cree 
children were selected as classmates, captain of the the gang, 
luncheon companions, party guests, members of involving 
neighbors, and so on. Pictures of social situations ortuni- 
the two races Were also presented to afford further cad dice 
ties for expressions of attitudes. Horowitz found that : is rela- 
appeared at an early age and that attitude developmen 
tively consistent for groups and for individuals. 


j ears in 
An interesting approach to attitude assessment EP ritte 
Minard's study of racial attitudes.” Statements we 


proximity to members 
Filipinos, Chinese, M 
might center around 
membership, and So o 


; ion 
1 tuati0! 
exicans, and Negroes. The S! 


k or club 
neighborhood residence, team 
n. 


. r per- 
Attitudes toward the self, classmates, home, school, or P 


Personality: 


be mean, willing to c 
to take advant: 


"^R. D. Minard, “Race Attitudes Of Iowa 
Character, 4 (2), University of Iowa, 1931. 

? W. W. Clark, E. W. Tiegs, and L. P. av 
Personality, Los Angeles, Calif.: California 


Children,” Studies in 


horpe, California Test of 
Test Bureau, 1942. 
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me prevailing attitudes of pupils toward school subjects, in 
bi groups, and practices, and changes of attitude pro- 
id A isse un discussions, interviews, or other tech- 
3 clined iq a evaluation of attitudes is in actuality almost 

y. since many of our most worthwhile educational 
goals are related to the development of pupils’ attitudes. We 
call these goals character development, citizenship, moral and 


ethi : : s 
thical behavior, or social cooperation. 


APPLICATIONS IN THE CLASSROOM 


a and attitudes are perhaps generally thought of as 
TOSS of motivation for learning. However, motives, values, 
attitudes, interests, and ideals which are socially acceptable 
and personally satisfying are not only valuable as supports for 
academic learning but represent valid educational goals in 
themselves. In the evaluation of the educational growth of 
Pupils, the development of interests and attitudes deserves 
careful consideration. 

A study of pupils’ interests by any of the techniques sug- 
gested may provide the teacher with information useful in: 


Understanding pupils. 

Discovering motivational possibilities. 

Relating teaching to pupils’ interests and experience. 
Studying and evaluating pupils’ interest changes. 
Helping pupils to: (a) become aware of their interests, 
(b) evaluate their interests, and (c) increase their un- 
derstanding of themselves. 

6. Stimulating thought and discussion among pupils con- 
cerning the implications of their interests. 


vA YN > 


Investigations of attitudes provide the teacher with data 
which may be significant in a number of respects. Such data 
may enable the teacher to: 
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1. Attain an increased understanding of pupils. 

2. Attain deeper understanding of pupil behavior. 

3. Develop curricular, field, social, or civic experiences re- 
lated to major educational goals. 

4. Evaluate pupil behavior on a broader basis than that 
of subject-matter attainment. ; 

5. Study attitude change as the result of directed experi- 
ences. 

6. Assess the relative effectiveness of various teaching 


. : n il 
methods and techniques as a means of influencing pup! 
attitudes. 


Data derived from careful studies and evaluations of ins 
terests and attitudes will be of value in compiling cumulative 
records and in accurate reporting of educational attainments 
not represented in achievement-test results. For example, such 
Characteristics as cooperation, self- 
tolerance, optimism, leadership, respect for the rights of 
others, respect for the contributions and ideas of others, and 
such attitudes as those toward civic affairs and authority form 
an essential part of the evaluation of pupil progress and at 
tainment. A sincere attempt on the part of the teacher to de- 
velop adequate bases for judgment with respect to such char- 
acteristics as those listed above could be expected to improve 


the teacher’s understanding of pupil behavior and his evalua- 
tion of pupil status and progress.? 


control, self-confidence, 


SUMMARY 


Interests and attitudes are essential aspects of the emo- 
tional and behavioral life of the individual and are essential 
in motivation and learning. The assessment of interests and 


* For a list of educational objectives and Sügrested ieansToievit 
uating them, see J. W. Wrightstone, "Measuring thé Atfutnment of 
Newer Educational Objectives,” Sixteenth Yearbook of the Department 
of Elementary School Principals, Washington: National Education 
Association, 1937, pp. 493—501. 
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attitudes is an important aspect of the evaluation of pupil 
Progress and attainment with respect to important educational 
Objectives. 

Interests can be studied by a variety of methods: observa- 
tion, direct questions, check lists, and interest inventories. A 
Variety of instruments provide means of gathering data con- 
cerning a broad range of pupil preferences with regard to 
School subjects, activities, forms of recreation, hobbies, and 
vocations. 

Attitudes can be investigated by means of observations, 
anecdotal records, interviews, ratings, and attitude and opin- 
ion scales. Data concerning pupil attitudes may contribute to 
the understanding of pupils, the planning and conduct of the 
Instructional program, the evaluation of pupil attainments, 
and the development of adequate records and reporting prac- 
tices. A number of scales are available for the study of sig- 
nificant attitudes. In his use and interpretation of these scales, 
the teacher should consider the method utilized in the de- 


velopment of the scale. 
Data concerning pupil interests 
Significantly to the educational program with reference to in- 


struction, evaluation, planning, recording, and reporting. 


and attitudes may contribute 


STUDY AND DISCUSSION EXERCISES 


1. In what ways is it valuable for the teacher to understand 
techniques for the measurement of interests and attitudes? 

2. Select an interest inventory and suggest specific ways in 
which its results can be of value to the classroom teacher. 

3. List a number of attitudes which you feel are closely related 


to the effectiveness of classroom learning. Describe one of these 
attitudes in specific behavioral terms. How might the teacher study 
this attitude among pupils in his classroom? 

4. Select a published interest inventory. Outline the bases for 
its development and utilize these to discuss the possibilities of in- 
terpretation of scores which might be derived from this inventory. 
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5. Indicate as specifically as you can the contributions bees 
teacher investigations of pupil interests and attitudes can d 
(a) teacher understanding of pupil behavior, (b) the Sia op sit. 
and maintenance of cumulative records, (c) pupil-teac si a’ 
ferences, (d) parent-teacher conferences, and (e) reports of pup 
s. pupils in social situations in the classroom and e 
of the classroom and write behavioral descriptions. Make a eis 
of the social attitudes which appear to be represented in the 
havior observed. 


SUGGESTED ADDITIONAL READINGS 


Cronbach, L. J.: Essentials of Psychological Testing, New York: 
Harper & Brothers, 1949. . É 
Chapters 15 and 17 include descriptions of methods and ot 
ments used in the assessment of interests and attitudes, ori 
suggestions for the applications of results. Chapter 18 di 
cerned with observation as a method of studying behavior. — 
Greene, E, B.: Measurements of Human Behavior, rev. ed., 
York: The Odyssey Press, Inc., 1952. inter- 
Chapters 20 and 21 are devoted to the measurement of in d 
ests and attitudes and contain descriptions of instruments an 
discussions of methods of assessment. " 
Jordan, A. M.: Measurement in Education, New York: McGraw 
Hill Book Company, Inc., 1953. ude 
Chapters 16 and 17 Present an account of interest and attitu 


; : i est 
measurement. Chapter 16 includes a list of published inter 
inventories, 


Remmers, H. H., and N. 


d 
L. Gage: Educational Measurement an 
Evaluation, New York: 


Harper & Brothers, 1955. Lc 
Chapter 13 Presents a discussion of the nature, organizatio™ 
and significance of attitudes, and chap. 14 contains a discu 
sion of methods of studying attitudes and interests. se 
Super, D. E.: Appraising Vocational Fitness, New York: Harp 
& Brothers, 1949. 


Chapters 16, 17, and 18 ar 
of the nature and measu 
marily vocational. 


r iption 
e devoted to a detailed desci P 
rement of interest; the emphasis is P 


CHAPTER ELEVEN 


Rating Techniques in Pupil 
Evaluation 


Some of the most important results of education cannot be 
evaluated by the usual paper-and-pencil tests: the acquisition 
9f effective work habits and study skills, for example, and the 
development of acceptable social attitudes and behaviors. 
Good work habits, cooperativeness, industry, responsibility, 
and Citizenship are commonly listed on report cards and 
Cumulative records and are recognized as standard educational 
Objectives. Rating methods are among the possibilities of 
evaluating pupil progress toward these educational goals. This 
chapter describes means of summarizing and recording teacher 
ratings and suggests ways of improving the methods which 


the teacher may use. 


PROBLEMS IN THE USE OF RATING METHODS 


As we have seen, tests are tools to provide data upon 
Which to base estimates and judgments. A rating represents 
an estimate or judgment regarding a pupil characteristic, based 
92 the teacher's observations of the pupil. Test results imply, 
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in the nature of the items, certain definitions of intelligence, 
achievement, readiness, and so on, and these definitions form 
a point of reference for the interpretation of test results. On 
the other hand, a rating on citizenship may have no m 
definitive point of reference. A teacher or parent may € 
ask, "What does the rater mean by citizenship? On what kin 
of data is his rating based?" Ratings typically are highly sub- 
jective in nature. They tend to reflect the characteristics of the 

rater to almost as great an extent as they do those of the 10- 
dividual being rated. 

-The most common sources of error in ratings are nage? 
quate or inconsistent definition of traits, fixed patterns o 
rating, and halo effect. A further problem is lack of con- 
sistency between several ratings of the individual on the same 
trait. VM 

Perhaps the key problem in the interpretation of ratings 3 
the definition of the rated characteristic. Suppose that teachers 
A. and B are rating pupils on cooperation. The definition uti- 
lized by teacher A may involve a large measure of obedience 
or emphasize cooperation with the teacher. Teacher B may 
evaluate the same trait almost entirely on the basis of ability 
to work cooperatively with other pupils. Ratings of the same 
pupils by these two teachers would bear no necessary rela- 
tionship to one another. At the same time, a parent attempting 
to interpret the ratings might have in mind a definition of c0 


operation which differs markedly from those of the two teach- 
ers. Unless traits are clearl 


y defined, ratings may be mean- 
ingless. 


The fixed pattern is a common source of error in ratings- 
Some raters, for example, are inclined t 


generous in their judgments. This has 
erosity error. A second, and smaller, g 
strate a consistent tendency to underra 


o be consistently over- 
been termed the ger- 
roup of raters demon- 
te, and still others are 


ing differences. The resulting ratings reflect the characteristic 
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evaluative tendencies of the rater and may have little reference 
to the actual characteristics of the individuals being rated. Fig- 
ure 13 illustrates the possible effects of such rating patterns 
With respect to a hypothetical class of twenty students. Teacher 
A rates 45 per cent of these pupils as “excellent” or “su- 
perior”; he is relatively generous in his ratings. Teacher B is 
apparently unable to differentiate among a majority of the 
Pupils, since he places 70 per cent in a single category in the 


TRAIT: COOPERATIVENESS 


Scale Excellent Superior Good Fair Poor 
% % % 96 io 
Teacher A 20 25 40 10 5 
Teacher B 5 10 70 10 5 
Teacher C 5 10 30 35 20 


Fic. 13. Per cent of a hypothetical class of twenty pupils placed 
by three teachers under each of five levels of a scale for rating co- 
Operativeness. 


center of the scale. This is an instance of the "average" error. 
Teacher C rates 55 per cent of the group as “fair” or “poor,” 
illustrating the error of underrating. These ratings would be 
difficult to interpret apart from a knowledge of the raters and 
their characteristic rating patterns. 

The halo effect is a further common source of error in 
ratings. The teacher forms a general impression concerning 
the pupil, and his ratings of the pupil's traits are as likely to 
be representative of this general impression as they are of the 
Specific characteristic being rated. For example, Mary may 
have a pleasing appearance and manner. Teacher ratings of 
Such traits as dependability, emotional stability, and coopera- 
tiveness may be influenced favorably by the teacher's general 
impression of Mary rather than by her actual status with re- 
Spect to the specific characteristic being evaluated. Unfavor- 
able general impressions may lead to equally unrealistic trait 
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evaluations. The influence of the halo effect is illustrated in 

Figure 14. is 
Inconsistencies frequently appear in repeated ratings 9 

characteristic. That is, two teachers rating a pupil on a given 


Influence of the Halo Effect 


Coaperativeness 


Emotional 
stability 


Teacher's 
general impression 

or — 
influences ratings 
on 


Dependability 


" " "m lo 
Fic. 14. Schematic representation of the pervasive influence of ha 
effect on ratings on specific characteristics, 


characteristic may vary markedly in their evaluations. Again. 
a teacher may change his rating of a given pupil even in a 
Short space of time. This lack of consistency or reliability 
poses a problem with respect to the interpretation of ratings. 
By way of analogy, consider the situation which develops 
when two sets of comparable mental- or achievement-test 
Scores give entirely different pictures of the same person. Is 
the difference a result of the type of test used? Can it be ie 
plained in terms of differences in the two testing situations: 
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Was the subject in poor physical health or emotionally upset 
a one of these times? How can the differences be explained? 
Similarly, interpretation is difficult when ratings of the same 
person differ with respect to identical characteristics. Among 
the reasons for inconsistencies may be the characteristics of 
the rater, factors in the situation in which ratings are de- 
veloped, the nature of the characteristic being evaluated, the 
extent of opportunity to observe the individual, and changes 
in the individual over the period of time between ratings. 
There are many possible explanations. 

Ratings based on casual or incidental impressions are no- 
tably unreliable, but ratings based upon careful and systematic 
Observations of well-defined characteristics can be quite re- 
liable, Again, certain characteristics are more reliably eval- 
uated by rating methods than others. Ratings of traits which 
Can be observed objectively are typically most reliable. Ratings 
of general characteristics and traits which involve interaction 
With others tend to be least reliable.’ For example, traits such 
as cooperativeness and integrity are relatively difficult to eval- 
uate by rating methods. 


ORGANIZATION OF RATING SCALES 


The usual rating scale presents the rater with a set of char- 


acteristics (such as initiative, responsibility, and social effec- 
tiveness) which are to be evaluated. These traits may or may 
not be defined. The rater is asked to assess individuals by 
checking a point on a scale representing à level or degree of 
the trait. The list that follows indicates various ways in which 
the levels or degrees may be indicated: 


1. By means of numbers: 1, 2, 3, etc. 
2. Tn terms of frequency of occurrence o 
usually, seldom, never. 


f the trait: always, 


`H. L. Hollingworth, Judging Human Character, New York: Ap- 


Pleton-Century-Crofts, Inc., 1922. 
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3. By qualitative terms: excellent, superior, good, fair, 
poor. . 

4. By terms which refer to relative status, e.g., relative to 
others: outstanding, above average, average, below 
average, inferior. 

5. By descriptive terms which apply to each level or step, 
eg 
a. Recognized as a leader; assumes leadership willingly. 


b. Accepts leadership when specifically requested to 
do so. 


c. Avoids leadership. 
6. By means of coded numbers or letters: 
1 or A represents excellent. 
2 or B represents above average. 
3 or C represents average. 
4 or D represents below average. 
5 or E represents inferior, 


Each of these types of organization of rating scales may be 
of value as a means of providing the teacher with a frame = 
reference or a guide, provided the type of organization 1S 
suited to the trait being rated and the purposes which the 
rating is designed to serve, For example, Schedule A of the 
Haggerty-Olson-Wickman Behavior Rating Schedules uti- 
lizes an organization based on the frequency with which à 
type of behavior occurs. Four levels of 
for each of the fifteen traits listed. A 
titative value has been established for 
titative value is based on the seriou 


frequency are indicated 
weighted score or quan- 
each rating. This quan- 
sness and frequency of 
occurrence of the behavior among school children (Fig. 15) 


* M. E. Haggert 
Olson-Wickman B 
Yonkers, N.Y.: W. 


Y, W. C. Olson, and E. K, Wickman, Haggerty- 
ehavior Rating Schedules, Manual of Directions, 
orld Book Company, 1930. 
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B H 
ehavior Frequency of occurrence 
Problem 
Has 
Occa- Fre- 
H occurred s 
as never sional quent 
occurred once OF — occur. occur: Spore 
twice but 
rence rence 
no more 
Disinterest in 0 4 6 7 
" School work 
ruancy 0 12 18 21 


Ge MEN MEE Le M d 

E CERTUS 

^ i35: Organization of Schedule A of the Haggerty-Olson-Wick- 

m n Behavior Rating Schedules, indicating basis in frequency of oc- 
rrence and showing weighted scores. 


Schedule B of the Haggerty-Olson-Wickman Scales com- 
bines descriptive categories and quantitative values. The 
Weighted scores of Schedule B have been assigned on the basis 
of relationships between ratings on each of thirty-five traits 
and the behavior tendencies listed under Schedule A.* Figure 


Is his attention sustained? Score 


l 


Continually Frequently Usually Wide- Keenly 

absorbed becomes present- awake alive 

in himself abstracted minded and alert 
(5) (4) Q 0D Q0 


le B of the Haggerty-Olson-Wick- 


Fi ae 
G. 16. Organization of Schedu 
e categories are accom- 


ha Behavior Rating Schedules. Descriptiv 
nied by weighted score values. 

f Schedule B. These sched- 
meaningful quantitative 
ustrate the definition of 


ccurrence and descrip- 


16 illustrates the organization o 
Ules represent an attempt to assign 
Values to rating categories; they also ill 
Scale steps in terms of frequency of o 
tions, 


* Ibid, 
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IMPROVING RATINGS OF PUPIL 
CHARACTERISTICS 


In the use of rating devices there is no substitute for con- 
scientiousness, skill, and objectivity on the part of the rater. 
However, work with rating instruments indicates that evalua- 
tions based on these devices may be improved through the fol- 
lowing procedures: 

1. Selecting carefully the traits which are to be evaluated. 

2. Defining the traits. 

3. Describing the traits. 

4. Establishing a basis for judgment. 

5. Establishing scale steps. 

6. Organizing the rating instrument. 

1. Selecting the traits. In Selecting a list of traits to be 
evaluated by rating methods, the teacher should consider the 
purpose of the evaluation and the extent to which each trait 1s 
related to educational objectives, Carefully developed ratings 
of significant pupil characteristics provide essential informa- 
tion in the evaluation of pupil status and progress. Such ratings 
will increase the value of pupil records and reports and may 
play an important role in the teacher’s instructional planning. 

The traits selected for use ina teacher-developed rating in- 
strument should be (a) relatively few in number, (5) criti- 
cally related to the teacher's purposes in rating, (c) as clearly 
differentiated as possible, and (q) capable of clear and pre- 
cise definition, preferably in terms of Observable behavior. 

2. Defining the traits. A majority of the characteristics 
Which the teacher evaluates by means of ratings may be de- 
fined in a number of different ways. Since teacher ratings may 
be a means of providing information to other persons, such 
as the pupil, parents, or other teachers and professional per- 
sons, it is important that the rated characteristics be defined 
carefully, objectively, and Specifically. Such definitions are 
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necessary to interpretation of the evaluation represented by 
the rating. Probably the ratings that lend themselves best to 
accurate interpretations are those that can be described in 
terms of observable behavior, for many traits (e.g., “coopera- 
tiveness”) are subject to a number of possible definitions de- 
pending on the individual who interprets the term. However, 
the teacher can clarify his intention by means of a recorded 
definition such as the following: “Cooperativeness: The pupil’s 
ability to work harmoniously with his classmates in classroom 
activities and projects.” 

. Certain advantages are achieved through the use of this 
ype of definition. The ratings refer to classroom activities 
aa be observed by the teacher. There is no implication 

t the rating applies to the pupil's behavior in situations in 
Which the teacher has limited opportunity to conduct system- 
atic observations. Again, the reference applies to coopera- 
tion with other pupils rather than cooperation with the teacher. 

The pupil traits listed on report cards and other rating in- 
po are often inadequately defined on the report Or 
i In such instances it 1S usually advisable for the 

er to decide upon a clear definition and to record it. 

3. Describing the traits. As we have seen, à clear definition 
Of traits is helpful in rating. However, definitions are general 
descriptions or summary statements, and ratings are more 
likely to be reliable and meaningful when they are based on 
Specific behaviors which serve as indicators of the character- 
n being assessed. For example. having defined coopera- 
Es de the teacher lists pupil behavior which is related to 

efinition. The teacher's worksheet might look like this: 


Cooperativeness: The pupil's ability to work harmoniously with 


hi ^ PCI 1 
IS Classmates in classroom activities and projects. 
Behaviors: 
z Participates actively in group 
- Brings materials and ideas to C 


planning, work, and discussion. 
Jass to share with others. 
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3. Listens to the ideas and experiences of others. 

. Respects the opinions of others. 

- Shares his opinions with members of the group. 

- Abides by the decisions of the group. 

- Works with class officers, committees, and group leaders. 

- Does not needlessly disturb the work of others in the group. 
- Carries his share of responsibility for the work of the group. 


i09 10 tA 


Such a list of behavioral indicators provides a relatively tangi- 
ble basis for the observation and evaluation of the charac- 
teristic. 

4. Establishing the basis for judgment. J udgments with re- 
spect to any characteristic may be either absolute or relative. 
The statement, “This Object is five feet high," illustrates an 
absolute judgment. Linear and quantitative measures of 
length, height, Weight, etc., are based on standard units which 
give them a common meaning. A rating scale involving ab- 
solute judgments asks the rater, in effect, such questions as 
"Is he cooperative? To what degree?" This is probably the 
type of rating scale most commonly used. However, one 
might ask whether all raters use the same standard or unit of 
measurement as a basis for their judgments. 

In utilizing relative methods, the rater decides whether the 
pupil is relatively more or less cooperative than others. The 
comparison may be limited to members of an age group. 
grade level, or classroom group. Interpretations based on 
ratings of this type Specify or imply the limitation that the 
pupil is being compared with others of a group. Thus the rat- 
ings of pupils will be dispersed around the level of average OF 
typical performance for the specified group. The technique i$ 
roughly similar to that involved in the development of 
"norms"; that is, the Pupil is rated in terms of his status with 
Tespect to his group rather than in terms of a “standard” of 
"expected" level of attainment. The rating scales presented in 
Figure 17, prepared for the Springfield, Missouri, Senior High 
School, are examples of relative scales. 


Behaviors Which Indicate That One Is “Considerate of Others"* 


Teacher. Pupil 
bes age" means that the pupil exhibits the behavior indicated to 
out the same degree as the average pupil of his grade level. 
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l. Shares materials willingly and properly eee ttn 

2. Observes normal courtesies in personal re- 

lationships with others 

nice in and makes positive contribu- 
O group activities 

4. Returns materials to proper places after use....--- 

(Other behaviors; write in and rate) 


Behaviors Which Indicate That One Is "Not Considerate 


of Others"* 
Teacher Pupil. 


ERN" means that the pupil exhibits th 
about the same degree as the average pupil of 


e behavior indicated to 
his grade level. 
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: Hisingis others who are speaking . 
diea oing such things as cleaning fingernails, 
e out purse, combing hair, etc., while 
3 S students are making reports NUN d aD 
- Crowding ahead of others in lunch line, 
a coming into or leaving classroom 
Cutting across, shoving, or crowding in cor- 
Tidors 
5. Loud and boisterous in corridors 
(Other behaviors; write in and rate) 


* Springfield, Missouri, Senior High School, 1949 (mimeographed). 
Fic. 17. Examples of rating scales. 


192 EVALUATION TECHNIQUES FOR CLASSROOM TEACHERS 


A. special type of relative scale is the *man-to-man" scale. 
In developing this type of instrument the teacher selects ee 
tain pupils as "standards" for the various steps of each tran 
scale. One pupil is selected to represent each step of the 
Scale, and others are rated by comparison with these "stand- 
ards.” This method of establishing a basis for judgment ap- 
pears to have definite Possibilities for classroom use. 

5. Establishing scale steps. A further procedure in estab- 
lishing the basis for judgment is developing categories, leveti; 
or steps which represent the scale for each trait. Ordinarily 
we think of any trait variable (such as energy, enthusiasm, OT 
initiative) as essentially Continuous. In practice, however, ? 
number of areas or "units? are established along the trait scale 
as a matter of convenience rather than fact. For example, 2 
Scale for enthusiasm might be Tepresented as follows: 


Enthusiasm: 
Completely 1 
-om Extremely 
indifferent enthusiastic 
T 


rangement is more useful be 
scribed along the continuum. 


Enthusiasm: 


Indifferent Rarely shows Sometimes 


: Usually h with 
enthusiasm enthusiastic E UNE 


Pep and vigor great 
enthusiasm 
The number of steps included in a Scale will be determined, 
in part at least, by the purposes for Which the ratings are to 
be used and by the nature of the trait. In general, however, if 


“Summary Behavior Rating Scale, Springfield (Mi s ior 
High School, 1949 (mimeographed). See Sent 
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too few steps are included in the scale, it will not accurately 
reflect trait differentiations among individuals. Too many 
steps, on the other hand, may make the task of the rater cum- 
bersome or make differential judgments difficult. As with 
other evaluative devices, the purpose of rating scales is to dif- 
ferentiate among individuals in terms of specified characteris- 
tics. More refined and specific differentiations ordinarily rep- 
resent a more adequate basis for interpretation and evalua- 
tion. 

6. Organizing the rating instrument. Rating instruments 
are customarily organized according to one of four general 
plans: (a) the check list, (5) the coded scale (using coded 
numbers or letters), (c) the graphic form, and (d) the de- 
Scriptive scale. 

The check list presents the rater 


istics or behaviors to be checked off if they appear to apply 
“Behavior-observation Record” 


t, developed to help teachers 


with a list of character- 


to the person being rated. The 
(Figure 18) is such an instrumen 


understand the behavior of their pupils. 
The coded scale is commonly used in pupil report cards. 


Typically, it employs numerals or letters which are described 
in one section of the card. Following each rated character- 
istic, the code number or letter (frequently a “grade”) is in- 
dicated to represent pupil standing. The following excerpts 
from the “Primary Pupil Progress Report” of the Corvallis 
Public Schools (Figure 19) are illustrative of such organiza- 
tion.’ The brief statement of philosophy will perhaps indicate 
the basis for the letter and number codes used in connection 
with this particular pupil report card. The numerals opposite 
reading items represent the pupil’s academic status in the sub- 
ject as interpreted under “Subject-matter Evaluation.” The let- 
ters opposite the citizenship items indicate the pupil's status 


with respect to each of the listed traits. 


5 «primary Pupil Progress Report," Corvallis Public Schools, Corval- 


lis, Ore. 
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BEHAVIOR OBSERVATION RECORD 

To understand behavior, it is important to observe the pupil's 
reactions on the playground, in the neighborhood, and at home. 
Check words or phrases that describe the behavior of the pupil as 
you have observed it. Please feel free to individualize the report 
as much as possible by adding descriptive comments. If you know 
of reasons for the conditions you check, please jot them down at 
the right of your answers. 


Is this pupil physically strong? 


Is strong and active Has ordinary endurance 
Seldom tires Is listless, easily fatigued 


Does he have good work habits? 


— — Completes what he starts Needs urging to stay with 


Is able to evaluate his a task 
work Is easily discouraged : 
Capable of sustained at- Seldom completes the job 


tention Easily distracted 


Does he get along with other people? 


Is a successful leader — — Is quarrelsome 
Works and plays well 
with others 


Is overaggressive 
Is easily led 


— Earns recognition Often lies to get out of 
Prefers to work by him- difficulties 
self 


——— Is disobedient to teachers 
— — . Has few friends 

Is disliked and avoided 
by others 


Is destructive 


Has bad temper when 
thwarted 


What is his usual disposition? 


— Cheerful, happy 

Kind and sympathetic 
Self-controlled, calm 
Quiet, reserved 


— — Impulsive 
Stubborn 
Moody 


Fic. 18. Behavior-observation Record, used in the San Diego Public 
Schools (San Diego, 1949). 


Corvallis teachers believe that each child's progress should be 
reported to him and his parents at least three times each year. 
They also believe that each pupil should be evaluated in terms 
of his individual growth and progress and in terms of his achieve- 
ment in academic work. In order to do this dual evaluation task, 
two different sets of symbols and meanings are required. 

Individual Evaluation 
A—Pupil is using all his ability. 
B— Pupil is using nearly all his ability. 
C— Pupil is using about half of his ability. 
D— Pupil is using less than half of his ability. 
E— Pupil is using almost none of his ability and is making very 

little individual progress. 
Subject-matter Evaluation 

1—Pupil’s achievement and position in this subject are excellent. 
2— Pupil's achievement and position in subject are above average. 
3—Pupil’s achievement and position in this subject are average. 
4— Pupil's achievement and position in subject are below average. 
5— Pupil's achievement and position in this subject do not meet 


the standards for this subject. 
First Second Third 


READING report report report 
Reads with understanding. . . -- - 2 
Reads well to others. .....- 3 


Shows ability to attack new words 2 


Enjoys stories and poetry. ....- 3 
EFFECTIVE CITIZENSHIP 

Follows directions promptly.... B 
Makes good use of free time. ... B 
Completes work. ...... n A 
Takes care of property... -+-+ [o 
Accepts criticism... .. enn B 
Displays good sportsmanship. - - D 

Uses courtesy in manner and 
Speech accevecveces 25822" c 
Cooperates in classroom. .. +--+ C 
G 


Controls own freedom. ..--- +++ 
Fic. 19. Primary pupil report card. 
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In the graphic type of rating scale, checks are placed along 
a line. Steps in the graphic scale may be presented numer- 
ically, as coded letters or numbers, or as descriptive phrases. 
The following item is illustrative of the graphic organization 
utilizing descriptive-trait categories: 


3. Is his attention sustained? 


Distracted: Difficult to Attends Is absorbed Able to hold 
Jumps rap- keep at a adequately in what he attention 


idly from task until it does for long 
one thing is completed periods 
to another 


The graphic presentation permits relatively rapid assessment 
of the results of the rating. 

Descriptive rating scales may be organized in a variety of 
ways. The distinctive feature of this type of scale is that de- 
scriptions indicate the various scale steps. The item for rating 
“attention” in the preceding paragraph combines the graphic 


and descriptive forms of presentation. The descriptive scale 
may also be organized as follows:7 


B Does he need frequent Seeks and sets for 


prodding or does he go ahead himself additional 


without being told? tasks 


Completes suggested 
supplementary work 
Does ordinary assign- 
ments of his own 
accord 

Needs occasional 
prodding 


° Haggerty et al., ibid. 


* Adapted from American Council on Education Personality Rep ore 
Form B, Washington: Committee on Personality Traits, American 
Council on Education (mimeographed), 
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Needs much prodding 
in doing ordinary as- 
signments 
No opportunity to 
observe 


The ordering of the steps constitutes a problem in setting up 
rating categories for each trait. In many scales, constant al- 
ternatives are used; that is, a set of steps is established which 
applies to all traits included in the scale. The levels repre- 
Sented may be “Excellent, good, fair, poor”; “Always, usually, 
frequently, seldom, never”; “Outstanding, above average, 
average, below average, inferior”; and so on. Coded number 
9r letter forms of organization typically utilize constant al- 
ternatives. 

In general, ratings are likely to b 
of the scale for each trait are set down in random order. That 
1s, the “good” and “poor” ends of the scale used in the graphic 
form may be alternated in random fashion. This procedure 
encourages the rater to examine each descriptive statement 
and minimizes the tendency to check one or the other side of 
the rating sheet continually. 


e more accurate if the steps 


USING RATING DEVICES IN SCHOOLS 


As we have seen, rating pupils is one of the teacher’s cus- 
tomary responsibilities, for ratings are required for report 
cards and school records. Techniques have been designed to 
improve ordinary rating devices such as report cards, and 
these techniques may be utilized for a variety of educational 
Purposes, 

Ordinarily, the task of reporting pupil status or progress 
Presents difficulties for the teacher. Should the pupil be 
8taded on the basis of “standards” or “expectations” for his 
Stade? On the basis of improvement or growth? Of a com- 
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parison with others of his grade? These questions concern the 
main problem of the establishment of a basis for judgment. 
Where established procedures exist in schools, evaluation may 
be improved through clear statement of purpose, definitions 
of traits, and clarification of the basis for judgment. Presenta- 
tions by teacher committees and discussions in staff meetings 
provide a means toward the development of common under- 
standings essential to meaningful evaluations. 

Report forms, of course, must be interpreted by parents. It 
is therefore advisable that the traits evaluated on reports be 
clearly defined as to the meaning and significance of ratings. 
Printed statements and discussions related to the develop- 
ment and use of report cards are often helpful in increasing 
parent understanding of the ratings. 

The teacher may utilize rating devices for a variety of use- 
ful purposes in the classroom. A few of the possible areas of 
usefulness are: 


1. Study of the work habits and skills of pupils. : 

2. Study of pupil behavior in specified group activities 
(such as games, field trips, committee work). . 

3. Evaluation of performance or products (as in handwrit- 
ing, art work, Speech, shop work, oral reading). 

4. Pupil self-evaluation with respect to specified traits, aC" 
tivities, and interests (such as cooperation on à field 


trip, work habits, study skills, contributions to the 
class). 


The following suggestions are designed to serve as a guide 
to the teacher in the development and use of rating devices. 


1. In developing the device (scale or check list), relate it 
to educational objectives, d 

2. State clearly the behaviors which are to be observed an 
rated. 
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3. Rate one trait at a time. Whenever possible, it is advis- 
able to rate all pupils on one trait before going on to the 
next. 

4. Examine ratings for indications of lack of distribution. 
Although frequencies should be greatest around the 
center of each trait scale, ratings should be dispersed 
over the length of the scale. 

5. Limit the number of traits which are to be considered in 
any one device. 

6. Rate a pupil only after adequate observation of the spe- 
cific characteristic which is being evaluated. 


SUMMARY 


Rating devices represent a convenient means of compiling 
data which provide a basis for the evaluation of pupils. Rat- 
ings are typically subject to certain errors, but their limita- 
tions can be minimized by (1) careful selection, definition, 
and description of the characteristics to be rated, (2) estab- 
lishment of a clear basis for making differential judgments, 
and (3) organization of the scale to provide a meaningful 
dispersion of ratings for each trait. 

The traits selected for use in a rating instrument should be 
relatively few in number and should be adapted to the pur- 
Poses for which the ratings are to be used. They should be 
Clearly defined and described, preferably in terms of observ- 
able characteristics. The traits may be assessed on the basis of 
either absolute or relative judgments. Rating instruments are 
ordinarily organized in the form of (1) a check list, (2) a 
Coded number or letter device, (3) a graphic scale, or (4) a 
descriptive scale. 

Rating devices serve a variety of purposes in the school and 
Classroom, Grading systems, report cards, and cumulative rec- 
Ord forms involve the rating process. The teacher may develop 
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rating instruments to study and evaluate a wide range of pupil 
behaviors and attitudes, and to the extent that he improves his 
ability to develop and utilize instruments of this type, his pro- 
gram of evaluation will be less tied to those educational ob- 
jectives which are more readily assessed by means of the usual 
paper-and-pencil tests. 


STUDY AND DISCUSSION EXERCISES 


1. List educational objectives important in your teaching which 
cannot be measured by paper-and-pencil tests. " 

2. Discuss the merits and limitations of absolute and relative 
measures as they apply to rating instruments. " 

3. What specific values do you see in pupil self-evaluation by 
rating devices? In what ways might self-rating scales be useful in 
your classroom? . iB 

4. How would you develop a pupil self-rating scale to stimula 
interest in neatness in written work? 

5. Develop a rating device to assist in the evaluation of the 
products or procedures of pupil work in any one of the following 
areas: shop, English, art, science, handwriting. " 

6. (a) Select a subject area. Develop a definition and behavior 
description of study skills or work habits in that area. (b) Or : 
ganize a rating instrument based on your definition and a 
tion of the trait. (c) Present reasons for your selection of a paf 
ticular type of scale organization. 


SUGGESTED ADDITIONAL READINGS 


Cronbach, L. J.: Essentials of Psychological Testing, New York: 
Harper & Brothers, 1949, 
Chapter 18 of this comprehensive text is a discussion of tech- 
niques of observing behavior in normal situations. The values 
and limitations of rating methods are considered. m 
Greene, E. B.: Measurements of Human Behavior, rev. ed., n 
York: The Odyssey Press, Inc., 1952, P 
Chapter 16, “Types of Estimates," includes a relatively co™ 
prehensive account of rating methods and devices. 
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Guilford, J. P.: Psychometric Methods, New York: McGraw-Hill 
Book Company, Inc., 1936. . " 
Chapter 9 presents an excellent account of rating methods. pe- 
cific limitations and advantages of various types are indicated. 
Jordan, A. M.: Measurement in Education, New York: McGraw- 
Hill Book Company, Inc., 1953. : -— á 
Chapter 18, “Measurement of Personality Traits, inclu es a 
discussion of rating scales. Illustrative materials are included. 
Micheels, W. J., and M. Ray Karnes: Measuring Educational 
Achievement, New York: McGraw-Hill Book Company, Inc., 
1950. J 
Chapter 13 is concerned with observational techniques in rela- 
tion to evaluation. Guiding principles are presented for using 
the results of observations. j 
Remmers, H. H., and N. L. Gage: Educational Measurement an 
Evaluation, rev. ed., New York: Harper & Brothers, 1955. " 
Chapter 12 includes a concise discussion of rating-scale meth- 
Ods. Suggestions for the development of graphic scales are pre- 
sented. 
Thomas, R. M.: Judging Student Progress, New York: Longmans, 
Green & Co., Inc., 1954. f 
Chapter 11 presents a relatively nontechnical account of rtis 
Scales and check lists. The discussion centers around school use 
of the instruments. LAS 
Thorndike R L. and E. Hagen: Measurement and Evaluation in 
Psychology and Education, New York: John Wiley & Sons, Inc., 
1955, 


Chapter 13 presents a relatively comprehensive account of rds 
ing methods. Suggestions for the improvement of ratings are i 
cluded. 


CHAPTER TWELVE 


Constructing and Using Teacher- 
made Tests 


Standardized tests produced by specialists have an important 
part to play in education when they are used with proper te- 
gard for their advantages and limitations. Some of these 
limitations can be avoided by using teacher-made tests. Tests 
prepared by the teacher compensate for some of the weak- 
nesses inherent in standardized tests, but they are in turn 
subject to certain shortcomings. They are not a panacea for 
problems of evaluation, but they do serve important purposes. 


Careful test construction and interpretation can increase thet” 
usefulness. 


THE NEED FOR TEACHER-MADE TESTS 


As we have seen, standardized tests do not always fit local 
situations. For example, in one School a test of reading readi- 
ness is given to the entering first graders, and on the basis ° 
the results certain pupils are started on the reading program. 
Gratifying success may be achieved by all these starters. How" 
ever, when the same procedure is followed in another school, 

202 


CONSTRUCTING AND USING TEACHER-MADE TESTS 203 


considerable difficulty may be encountered by several pupils 
for whom success was indicated by the test results. The dif- 
ference may result from the fact that the reading materials 
used in the second school were more difficult than those used 
in the first. 

Another example concerns achievement testing. In one 
School, pupils in the fourth and fifth grades show up year 
after year as substantially below the norm in arithmetic, 
though they do the work normal for their grade and age in 
other areas. Pupils who are above average in ability do above- 
average work in other subjects than arithmetic. In one such 
System the principal planned to bring in special help for the 
teachers because of their indicated need for guidance in teach- 
ing arithmetic. The explanation was discovered to be the fact 
that in this locality it had been previously decided that arith- 
metic instruction could profitably be delayed until the fourth 
grade rather than offered in the third. The disadvantage of 
the delay does not disappear until two or three years later. 
By the time a group reaches the seventh grade, more of the 
Pupils will be happy and successful in their work in arith- 
Metic if they started studying it in the fourth grade than if 
they had started it in the third. 

A third example of the influence of local norms was en- 
Countered in a school system where the formal study of Amer- 
lcan history was subordinated to the study of local problems 
as an approach to history. The pupils did not do well on 
Standardized tests in which there was considerable emphasis 
©n American history. 

Different schools in the same community may find it de- 
Sirable to interpret norms quite differently. A school which 
draws its pupils exclusively from a neighborhood composed 


re Drill 


+W. A. Brownell and C. B. Chazel, “The Effects of Prematu: 
29:17- 


9n Third-grade Arithmetic,” Journal of Educational Research, 
28, 1935, 
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of professional people and business owners may be unjustifi- 
ably proud of the record made by its pupils. Teachers in a 
school whose pupils come mainly from lower socioeconomic 
strata and less stimulating environments may be discouraged 
because of low standing on national norms when, in fact, they 
might well be proud of their record in “pupil adjustment. 

Another shortcoming of standardized tests is that they ark 
not designed to explore and analyze small units of subject 
matter. Thus, the teacher may wish to give a test covering à 
half semester’s work or a unit on “Community Health Prac- 
tices.” Tests can be of assistance in the study of these smaller 
units, but the standardized test is not likely to help because 
of its comprehensive and general nature. 

Local variations in curricular practices, the nature of the 
pupil population, and the division of work into smaller units 
may make it impractical to use standardized tests as the sole 
measuring device. In such situations the teacher-made test 


can make a valuable contribution to better pupil under- 
standing.* 


Uses of Teacher-made Tests 


One of the values of teacher-made tests is that they COM 
pensate for the shortcomings of standardized tests. Thus, 2$ 
we have seen, teacher-made tests can be better adapted "e 
fit local pupil and curricular situations and are useful in €X- 
ploring and analyzing small units of study. In addition, they 
can be made to serve as a means of motivation and diagnosi 
of weaknesses. 

Teacher-made tests can be used to supplement and comple- 
ment other kinds of motivation. It has previously been indi- 
cated that it is bad practice to consider a test result as 2” 
"end" of education. But the test which is used to indicat 


safe 
* A. M. Jordan, Measurement in Education, New York: McG"? 
Hill Book Company, Inc., 1953, pp. 40ff. 
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Progress toward a goal and to challenge one's sense of achieve- 
ment is a helpful educational instrument. Children can, and 
do in favorable circumstances, enjoy taking tests. When chil- 
dren fear tests, it is because of the emphasis placed on results. 

Although much is heard these days about the desirability of 
making motivation intrinsic, or making the task interesting in 
itself, it is more exact to observe that interests grow, develop, 
and evolve. Interests are much more than discoveries of some- 
thing innate; they often develop as the result of the student's 
originally being "forced" to engage in a given area of experi- 
ence. Interests grow as the result of knowledge and the de: 
velopment of competence, of success, and familiarity. Hence, 
an examination or series of examinations may serve as the 
Original motivation for the pupil to check his knowledge and 
Progress and to gain success and familiarity. The teacher-made 
examination, given at shorter intervals than the standardized 
examination, can supplement other continuous experiences. It 
Can easily be designed specifically as an additional source of 
Motivation. 

Teacher-made examinations can also serve as an approach 
to diagnosis; that is, the test can be so designed that the scores 
Pupils make will reveal areas in which they are weak. Weak- 
nesses in number combinations, for example, or in certain 
arithmetical processes can be detected from the results of a 
test which is so constructed that certain of the questions deal 
With Specific skills or areas of knowledge. 

Teacher-made examinations are probably customarily used 
to help evaluate pupil achievement. As we have seen, this is 
Not an easy task. However, with study and care, it is possible 
to secure approximate and tentative data which will be of 
Value in determining pupil progress. In order to obtain this 
Information, the teacher-made test should be modeled after 
the Standard examination by seeking to improve the degree 
Of objectivity, reliability, and validity of the test. 
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APPLYING THE CRITERIA OF A *GOOD" TEST 


The criteria which apply to standardized tests are to a large 
extent applicable to teacher-made tests. Objectivity is desir- 
able; hence, it is recommended that tests be of the short- 
answer type in so far as possible. These would include true- 
false, multiple-choice, completion, and matching questions. 

Examinations of the so-called essay type are too difficult 
to score objectively to warrant a great deal of consideration. 
The contention that essay examinations teach pupils to Of- 
ganize their thoughts can be disposed of with the argument 
that an examination, with its accompanying pressure, as e 
à situation that is particularly conducive to the stimulation 9 
logical thinking. If organization of thought is the major m 
jective, it might be better to offer this training in special pape 
or themes. The teacher's evaluation of the essay might be 
more accurate when it is a special paper than when it is part 
of an examination which must be given a grade. . 

One type of short-answer question is the completion item. 
which requires the pupil to fill in a word or group of words 
in a blank space in a sentence or paragraph; the part of the sen 
tence which does appear gives the context into which the red 
ing word or words will fit. Examples are: *Metals from which 
our common United States coins are made are pot metal, cop- 
i ae: |: N “Bobby’s cafeteria lunch cost a 
cents. In addition he spent 5 cents on an ice-cream bar an 
7 cents for pop. The total cost of his lunch was j 

Completion questions are quite often difficult to formulate. 
All too frequently there is more than one word that 1s A 
propriate for a particular blank. After the teacher has ma ; 
one or two exceptions to what he first thought the poris 
should be, it is difficult to determine how much deviatio 
should be permitted. Completion questions have been ipn 
because they call for factual knowledge. Actually there shou i 
be no objection to the learning of facts—if they are meaning 
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ful to the pupil. This type of question is probably best for test- 
ing in such areas as arithmetic, knowledge of historical per- 
Sonages, geographical locations, and dates. It will be found 
to be inadequate for testing a knowledge of social trends, 
functions of the organs of the body, foods required in var- 
ious diets, or commercial and agricultural products of nations 
Or states. 

The true-false, or right-wrong, type of test item seems easy 
to construct but is actually so difficult to design that it has 
relatively little usefulness. Experts in test making rarely use it. 
True-false questions tend to place a premium upon verbatim 
learnings; since few things are so clearly right or wrong, an- 
Swers are often quite debatable, much to the chagrin of the 
teacher who made the test. Further, this type of question tends 
to penalize the brighter student, because it is he who most 
frequently thinks of the exception or conditional factors that 
Can alter the meaning. Let us examine the item, “Coins of 
the United States are made of pot metal, copper, nickel, and 
Silver.” The statement is true in a sense, but gold might also 
have been included; thus it is false because it is not inclusive 
enough. If the statement were changed to “Coins of the 
United States are made only of pot metal, copper, nickel, and 
Silver,” the answer is still debatable. There are gold coins still 
existence, but one could argue that they are not being made 
Now. The limited number of possible alternatives increases 
the Possibility of successful guessing and thus reduces the 
diagnostic value of the test. 

Since test makers show a tendency to mak 
than false, the student may systematically mark all the items 
that he does not know as true and be gratified with the result. 
n order to avoid this, penalties are sometimes imposed for 
8uessino the total score being obtained by a “right-minus- 
Wrong” formula. This practice can be criticized on the basis 
that complicating the scoring of an inadequate test does not 


Make ; ; 
ake it more valid and reliable. 


e more items true 
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However, the true-false item is frequently used in the class- 
room, although it should not be relied upon to too great an 
extent. Measures can be taken to increase its usefulness, how- 
ever. 


1. Allitems should be brief and without conditional factors. 

2. The use of such words as always, never, entirely, and 
absolutely should be avoided. ' 

3. The true-false item is more useful in language studies 
and mathematics than it is in social studies and general 
science, 

4. Statements should not be lifted from the textbook 
verbatim or with only minor revisions. 

5. Items should not be arranged in a regular pattern, such 
as T, F, F, T, F, T, T, F, etc., or T, F, T, F, etc. 


In general, it seems wise to recommend that true-false 
questions be cautiously used except for purposes of review 
and drill. Their use for evaluative or diagnostic purposes 5 
highly questionable. 

The matching question has been found to be quite prac 
ticable for classroom use. Two lists are set off or distinguished 
as pairs, as in the following example: 


; e 
Place the letter of the item in the right-hand column in the pes 
provided in the numbered (left-hand) column with which it 
most closely associated: 


1. heart a. helps put oxygen into the blood " 
——— 2. lungs b. place where food is mechanically 
———3. thyroid and chemically reduced 
— 4. arteries - carries blood to the heart 
———3. striated muscles d. muscle which pushes blood. 

— 6. stomach - carries blood to the extremities 


. muscles used in digesting food 
- controls oxygen metabolism 

. muscles used in locomotion 

. muscles used in breathing 


m ECs + oa 
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Matching questions are time-consuming for the student, since 
he has to search for the relationships; and for the elementary 
Pupil they may be confusing as well as time-consuming. Hence 
relatively few items should be grouped together. The columns 
Should be of different lengths so that thinking will replace 
Buessing at some of the more difficult items. Primary-grade 
teachers have found that it is easier for pupils to understand 
the directions if they are told to use a line to connect the two 
related statements. This makes scoring somewhat harder, but 
the advantage in pupil understanding may compensate. Acom- 
bination matching-completion question can be made by pro- 
Viding a group of words or phrases from which the pupil can 
Select to fill in the missing parts of a sentence or paragraph. 
The user of standardized tests will note that the most fre- 
quently used type of test question is the multiple-choice item. 
This question is commonly found in the “test yourself” fea- 
tures in magazines and newspapers. It possesses several ad- 
Vantages: the number of alternate responses (3, 4, or 5) re- 
duces the chances of guessing more than is the case with the 
true-false or matching type of question; the listing of plausible 
answers stimulates thinking; the limitation (as compared with 
the Completion item) of possible answers eliminates ambi- 
Buity in Scoring; and the technique of scoring is not compli- 
Cated. Multiple-choice questions are good teaching devices 
because discussion of the alternatives and analysis of the stu- 
dent's errors after the examination provides the opportunity 
°F careful explanation. : 
ina of the advantages of the multiple-choi ; 
Unterbalanced by the difficulty of making the questions, 
OWever, It takes considerable time to construct fifty or ə 
."ndred items of this type—certainly much more time than 
E Normally takes to construct ten essay questions. On the 
Ther hand, the time is compensated for by increased objec- 


ivit > 
Y and ease of scoring. 


ce item are 
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It is recommended that the making of good multiple-choice 
items be a continuing project for teachers. This can be done 
—at a saving of time over a period of years—by making a set 
of 3-by-5 test cards for the various areas (social studies, 
health, science) with which the teacher deals. Each card will 
contain one multiple-choice item. A notation on the card in- 
dicates the phase of the subject with which it deals (history: 
"Pilgrims"). When it is time to make the test, the items that 
are most pertinent to the particular manner in which the unit 
was studied during the term are selected to be reproduced. 
After the test and the discussion of the items, an item analysis 
will reveal that some are of questionable value. A tally is kept 
on the effectiveness of each question. Some will be correctly 
answered by all; too many of these items will indicate that the 
test is too simple. If one item is missed by all, it is probably 
too difficult or is ambiguously stated. Poor questions are either 
revised or discarded. The next time the same area is to be 
covered by a test, a few new items are added to the revised 
set of cards to cover current emphases. By Keeping separate 
the cards dealing with subdivisions of the total area, the 
teacher can easily make the test contribute to diagnostic pur- 
poses. 

Although the questions are discussed after the test, the stu- 
dent does not keep his test. To permit him to do so might lead 
some of the more sophisticated Pupils to get the exam an 
cram for the specific questions on it rather than to study 
widely. Just as important, however, is the fact that the teacher 
cannot afford the time to make a carefully constructed new 
set of multiple-choice questions every time the area is covered 
besides, there would be a loss in terms of the experience 
gained. Economy of the teacher's time can also be achieve 
by providing a series of spaces on the left or right side of the 
paper in which to place the number or letter of the chosen 
response. After the test items have been checked through us 
it may be advisable to have a separate answer sheet and ask 
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the Pupils to refrain from marking the question sheet. Defec- 
tive items on the reused question sheet can be ruled out and 
a substitute item can be written on the board or mimeo- 
graphed on a separate page to replace the deleted item. 

The multiple-choice item satisfies to a large extent the 
Criteria of a good test: it is objective, reliable, and economical 
of the teacher's time; it samples widely and can be so planned 
that it has a significant degree of validity. Techniques for se- 
Curing this validity will be discussed in the following section. 


TECHNIQUES OF TEST CONSTRUCTION 


: In constructing tests, it is important first of all to determine 
Just exactly what should and will be tested; since the purpose 
Of tests is to help determine the extent to which educational 
Objectives are being achieved, the test should be devised in 
terms of the specific objectives of teaching a particular unit of 
Study. The teacher who prepares lesson plans will have done 
this much earlier. For those who do not write lesson plans, it 
Would still be desirable to state the objectives that will serve as 
à guide to the construction of the items that really test what 
One has been teaching. This is clearly a long step toward mak- 
ing a valid test—a test that actually measures what it purports 
to measure. Comparing each item with the final objective of 
the test will not assure validity, but it will probably increase it. 

The goals or objectives of a unit must be specific in order 
to serve as a guide in making a valid examination. For ex- 
ample, such specificity is found in the following goals for each 
Student in a unit in seventh-grade social studies: 


1. Reads news of general (first-page) interest in the news- 


paper. 
2. Listens to the radio for purpos 


(weather, farm reports, news). 
Can state the importance of some current news events. 


4. Has some opinions on contemporary events. 


es of gaining information 


Ld 


RS 
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5. Knows the names and positions of persons in the head- 
lines, 
- Knows the geographical location of places in the yet 
7. Knows what sections of the paper contain certain kin 
of news. 
8. Is acquainted with several features in the local news- 
Paper. 


lon 


Such a list of Specific aims can readily be translated - 
test items. The teacher can determine the number of peso 
which should be allotted to each goal by analyzing its € 5 
importance and reflecting on the amount of time spent on wi 
Particular topic in class, Robert M. W. Travers acri 
that the teacher keep a “blueprint” of the class as a guide ! 
making a valid examination. To do this, the teacher lines oe 
a sheet of Paper in blocks and labels the horizontal pe e 
with the educational goals for the topical subdivisions oft s 
course, represented by the vertical blocks. Thus, under t 
heading of the educational goal "ability to spend p. 
wisely," reading across to the vertical column under the Hes j- 
ing of “budget,” the teacher writes descriptions of the a 
ties by means of Which one reaches the goal. When it ds er e 
to prepare a test, the entries in the boxes give clues to suita 
items. fie 

The following criteria will be helpful in making multiple 
choice questions: The key proposition should be stated in th? 
form of a problem; for example, “The first thing to do E 
learning of a case of scarlet fever in the community is to a 
- + +” This type of Presentation is important even in testing 
for facts, because the ultimate goal is that pupils will use 
ms. The alternative responses shoul 


le; unless they have some plausibility» 
the choice will be so easy that no problem is involved. In 1€ 


be as plausible as possib 


* Robert M. W. Travers, How to Make Achievement Tests, NeW 
York: The Odyssey Press, Inc., 1950, PP. 25-29, 
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Sponse to the problem stated above, each of the following 
has some plausibility: *(1) examine the water supply, (2) 
inoculate the citizens of the community, (3) examine the 
milk supply, (4) screen all windows, (5) spray all refuse 
Piles and garbage cans with DDT.” 

The wording of the questions should be appropriate to the 
grade level concerned; if the items are too easy, the difficulty 
Should. not be increased by the introduction of more difficult 
words unless vocabulary development is the goal. No answer 
should depend upon knowledge of the answer to another ques- 
tion in the same examination. Conversely, the information 
given in stating an item should not provide a lead to answer- 
Ing another question. The statement and the alternatives 
Should be as simple as possible—the correct answering of the 
Question should not depend on the pupil's ability to interpret 
à difficult statement. The answer which is supposed to be 
Correct should be unquestionably correct; that is, the various 
books available to pupils should agree on the point concerned. 


The teacher should never have to resort to saying, "In our 
» Alternative answers should cite 


book the answer is . . - 1 
as a means of sharpening the 


commonly held erroneous views 
Pupil's perception of unjustified bel jetsi Ser eeongle, “Auni 
Versal characteristic of adolescents is (1) they are physically 
awkward, (2) they resist school authority, etc." Whenever 
Possible, test for knowledge of principles and generalizations 
às contrasted to isolated facts. Tests of memory show that 
facts are forgotten more quickly than principles and generali- 
Zations, which have greater significance than facts for solving 
problems later. 

These observations may make the task of test making look 
formidable. Actually, practice and guided experience reduce 
the difficulty. Soon the teacher gives almost automatic heed 
to the suggestions cited above, and usable test items occur to 
him readily as the study of a unit progresses. A pair of teach- 
ers working together can be of great help to one another. But 


ACHERS 
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i i jective tests 
even though the task of making valid and objective s df 
an arduous one, it will pay in the increased effective a 
the teacher’s testing program. Testing will serve the p 


ane : e's edu- 
for which it is designed: to facilitate the reaching of on 
cational objectives. 


Some Sample Setups 


" nder- 
Careful attention to "setup" will help to make prin ni 
standable and economical. For example, questions can 
on 3-by-5 cards such as the one shown here. 


r e 


health 


physiology 


The use of beverage alcohol is condemned because 


(1) it speeds up heart action 

(2) it causes diseases of the liver 

(3) it reduces physical and mental efficiency 
(4) it slowly disintegrates the brain 

(5) it hardens the arteries 


Missed by out of 


taking the test. 
Date used: 


Pupil comments: 


The notation in the upper left-hand corner indicates the broad l 
area in which the question is used, and the note at the UP Un 
right gives the particular Subdivision. The other notes can " 
reduced to 12/37, which means that the question was miss " 
by 12 of 37 pupils; the date can be simply 11/12/57, an 
comments may be placed on the reverse side of the card. . 

After the cards have been Prepared, those the teacher "i 
lects are placed on the test paper with appropriate headings: 


C 
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HEALTH EDUCATION 
11-12-57 
PUPIL’S NAME 


Pl 

P" the number of the response which you select as correct in 

fir space provided to the left of the number of the question. The 
st one is answered correctly. 


3 
— 1. The use of beverage alcohol is condemned because 
(1) it speeds up heart action 
(2) it causes diseases of the liver 
(3) it reduces physical and mental efficiency 
(4) it slowly disintegrates the brain 


(5) it hardens the arteries 
— 2. Milk should be in the diet of most persons, adults and 


children, because 
(1) it is the food Nature planned for us 
(2) it contains so many ingredients that it rounds out 


the diet 
(3) it is essential to growing sound teeth 


(4) it is a clean, safe food 
(5) it is inexpensive 
It can readily be seen that a multiple-choice examination 

takes several pages of mimeographing; for this reason, and 
because pages must be turned for scoring, it is quite time- 
Consuming. A separate answer sheet with a number of blanks 
On it helps to offset this disadvantage. 
_DATE 


PUPIL’S NAME 


SUBJECT 
Place all of your answers On this answer sheet. Do not write on 


the question sheets. 


ls S8. uc A. zucca. 76. —— 
— 27. —— 52. TI. — 
^ jokin 28. —— Lyc MEN 78. 
4. 0 DO: mace SA. LL y we 
5. 80, = 55. 80. |. 
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Such an answer sheet can be scored by writing the correa 
responses on strips of stiff paper or cardboard and laying the 
appropriate strip alongside the column of answers. This proc- 
ess involves the changing of strips or the shuffling of papers 
as each column is scored. However, scoring can be jn er 
in one operation by cutting slots out of a piece of cardboar' 

and writing beside each slot the answers for one of the col- 
umns, as in the accompanying sketch. The number of the 


m -— NY UW 
o- rn ao 
Nos = uU 
NOU- n5 


=i A, ^- 


question is not indicated, since this would clutter the m 
card. Errors can be avoided by being careful to make the slo 
the exact length of the answer column. ide 

Perhaps the most rapid hand-scoring method is to prov! : 
an answer sheet on which the student has to block out the cof 
rect response, as follows: 


PuPIL'Ss NAME 


DATE 


SUBJECT 


re- 
Completely block out with soft lead pencil the number of a 
sponse which you select. Indicate only one answer for each ! 
double answers are scored as incorrect. 


12345  26.12345 51.12345 76. T ERE 
12345  27.12345 52 12345 77. T 
12345 28.12345 53.12345 78 dpt 
. 12345 29.12345 54 12345 79. 122 5 
.12345  30.12345 55.12345 80.123 


ARNE 
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The scoring sheet for this type of answer sheet consists of a 
piece of stiff paper or cardboard with holes punched so that 
only the correct responses show. 


f = 


In using this kind of scoring device it is necessary first to 
scan the papers for double answers. (This is necessary also 
When papers are machine scored, so there is no relative dis- 
advantage in this respect.) Each number that appears clear 
Under the punch hole and thus has not been blocked out by 
the pupil is an incorrect answer; hence all one has to do is 
count these for the minus score. If an item analysis is to be 
made, it will be necessary to cross out the number with a col- 
Ored pencil as one counts the incorrect scores. 

The danger in this system is that, since the original setup is 
$0 time-consuming, the teacher will be tempted to use the 
Same questions and the same answer sheets term after term. 
Actually this is not undesirable, providing the defective or 
outmoded items are constantly weeded out. This can be done 
Simply by telling the pupils that question 23, for example, has 


been eliminated. “A substitute question 23 is written on the 


board [or on a mimeographed separate sheet]. Answer this 
you will not forget and 


Question now—immediately—so that 

answer the question that is on the regular test sheets." It will 
be convenient if the question is so arranged that the response 
ls the same as the one originally designated for the defective 
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item; in this way the answer sheet or stencil will not have to 
be changed. 

It is recommended that no scoring formulas be used, such 
as the right-minus-wrong (R — W) scoring of true-false an- 
Swers to discourage guessing. Actually, differences in children 
are such that some will not be discouraged from guessing and 
others will not put down answers they are not sure of. Pat 
haps it is better to encourage intelligent, informed guessing 
than to discourage blind guessing. At any rate, the accuracy 
of the teacher-made instrument is not so great that its reliabil- 
ity will be significantly increased by scoring formulas. Fur- 
ther, it is not the score that is significant. Rather the object 15 
to discover what areas need particular attention, what is caus- 
ing a pupil's particular difficulty, and approximately what 
progress each pupil has made. Scoring formulas will not help 
to a significant degree in any of these purposes. 


RELATIONSHIP OF TEACHER-MADE TO 
STANDARDIZED TESTS 


Both teacher-made and standardized tests play important 
roles in the accomplishment of the ultimate purpose of all 
‘ests—to facilitate pupil growth. Standardized tests are prob 
ably more accurate than most teacher-made tests; they are 
more reliable, objective, and valid; but teacher-made tests 
have the advantage of being more readily adaptable to local 
conditions. They are relatively less expensive and can thus s 
used more frequently as checks On progress, as a means . 
motivation, and in some instances for aiding diagnosis. wr 
both kinds of test have a Part to play in the understanding : 
pupils and in the stimulation of their growth, it is obvious tha 
the teacher is not limited to the use of either one or the othe" 
exclusively; the two are supplementary to one another. 

Most modern educators have no objection to drill and 1°- 
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view providing it is motivated and the pupil understands the 
material. The inexpensive teacher-made test can provide 
Some of this drill in a situation which is enjoyable for the 
pupil if the score is not overemphasized. The test can also 
dE a check on understanding, since the items are spe- 
s y designed to be discussed in class, whereas the discus- 
sion of items on a standardized test is specifically avoided be- 
cause it would produce “practice effect” or coaching that 
would invalidate the test. 

The teacher-made test is a useful factor in motivation. The 
knowledge gained in preparation for tests has led many pu- 
Pils to develop new interests. It can accomplish this for more 
Pupils when teachers stop making scores the basis for inter- 
Personal comparisons and use the results to show each pupil 
What progress he is making and where he needs special work. 
The teacher should constantly bear in mind, however, that 
this transfer of interest from the test and its results to the sub- 
hs ripe consideration is not automatic. It will be necessary 

im to show how the interest should expand, how the 
maples can be used more effectively than it is in a pencil- 
~paper test, and to indicate the personal value of increased 
knowledge. 

Thus standardized tests are of greatest value in estimating 
achievement over a period of time—from the beginning to 
the end of the term—whereas teacher-made tests are of great- 
est aid in facilitating intermediate steps in this long-term 
growth. 


SUMMARY 


imitations. They often do 
point of the type of mate- 
he ability and back- 
made tests can be 


Standardized tests have inherent 1 
a fit local situations from the view 
S used, curricular emphases; and t 
ground of the pupil population. Teacher- 
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used to compensate for some of these limitations because they 
are readily adaptable to local emphases, they can be devised 
to fit small subdivisions of a subject area, and they can and 
do serve as sources of motivation. When carefully made and 
used with due caution, the teacher-made test is effective for 
diagnosis. 

Teacher-made tests should meet the criteria of good stand- 
ardized tests. Objectivity can be increased by using short- 
answer items broad enough in range to reveal ability and lim- 
ited enough to be easily scored. Validity can be increased by 
clearly formulating the aims of the unit or area on which the 
test is based. Reliability can be increased by continuous study 
of the teacher-made test through periodic analysis of pupils 
answers. Ease of scoring should be kept in mind in — 
ing the test, and special answer sheets and scoring d : 
will increase economy. The authors feel that the over-all a 
vantages of matching and multiple-choice answers are such aS 
to warrant preferring them over completion and true-false 
items in short-answer tests and over the essay type of test. 

Teacher-made tests are not intended as substitutes Ls 
standardized tests. Both types play valid roles in a 
and both should be regarded as aids to instruction and 4 
supplementary to each other, not as ends of education. 


STUDY AND DISCUSSION EXERCISES 


1. What advantages of teacher-made tests over standardized 
tests have you found, through your reading or experience, ot 
than those listed in this chapter? to 

2. Under what conditions is it permissible for the teacher 
use subjective data in the evaluation of his pupils? king 

3. Recall some of the experiences you had as a student ta de 
true-false examinations. Would your experiences accord with 


: . P ] e O 
Observations made in this chapter about the use of this tyP 
question? 
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* b era your own set of fifteen or twenty multiple-choice items 
udo E content of a chapter or several chapters of this book 
n T: 1 ds others to take and then to criticize. Summarize what 
s en about multiple-choice questions from this experiment. 
mod ip out the objectives. ofa class you have taught or are 
vie is to teach and submit the list for criticism. Design the 
s in such a way that testing on them is feasible. 

ed xTM the feature “It Pays to Increase Your Word 
renis in any issue of the Readers Digest. Point out the in- 
in sm which the author has attempted to mislead by present- 
Ba plausible" but incorrect response. How can you use this 
Practice profitably in test construction? 
Le a group of persons who are interested in tests and sum- 
ihe eee techniques they suggest for improving such. tests as 
est to stimulate interest, the pretest at the beginning of 


a unit of work, etc. 
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Pese Harry A., Albert N. Jorgensen, and J. Raymond Ger- 
2d T Measurement and Evaluation in the Elementary School, 
Ma New York: Longmans, Green & Co., Inc., 1952, pp. 160- 
This chapter deals with different types of objective questions 
(completion, multiple-choice, matching, etc.) and cites help- 
ful suggestions for constructing them. The suggestions for each 
n of test item are summarized. 
Teal] . Murray, and David Segel: Testing 
Ed chers, U.S. Department of the Interi 
ucation, 1936, 42 pp. 
This bulletin reports a survey of testing practices and evaluates 
their effectiveness as a basis for making suggestions for improve- 
ment. The bulletin is designed for administrators and forward- 
s ens teachers. 
Prin C. C. (rev. by J. C. Stan 
ools, 3d ed., Englewood Cliffs, 
Pp. 139-206. 
Part IL, consisting of three chapters, de 
Paring, and evaluating teacher-made test 


Practices of High School 
or Bulletin 9, Office of 


ley): Measurement in Today's 
NJ.: Prentice-Hall, Inc., 1954, 


als with planning, pre- 
s. Different kinds of 
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test items and the special problems each presents are dealt with, 
and suggestions are offered. 
Torgerson, Theodore L., and Georgia Sachs Adams: Measurement 
and Evaluation, New York: The Dryden Press, Inc., 1954, pp. 
220—243. 
The uses and characteristics of good teacher-made tests are de- 
Scribed. Suggestions are given for making essay, completion, 
true-false, multiple-choice, and matching questions. A check 
list is provided for evaluating teacher-made tests. 
Travers, Robert M. W.: How to Make Achievement Tests, New 
York: The Odyssey Press, Inc., 1950, 180 pp. 
This short book is full of practical suggestions for planning and 
constructing teacher-made tests. It covers all subjects, but sci- 


ence teachers will find the explanations and examples especially 
helpful. 


CHAPTER THIRTEEN 


Improving Appraisal Practices 


Pupil development is a many-faceted phenomenon. Aspects of 
Physical, mental, emotional, social, and academic growth are 
present in problems of measurement and evaluation. Many 
factors are at work to produce growth in any one of these 
areas or to produce interrelated (organismic) growth in all the 
areas. Among these factors the following are outstanding: 
hereditary potential, health, sensory equipment, home condi- 
tions, family relationships, community mores, political philos- 
Ophy, curricular demands, educational philosophy, and the 
Child's reactions to all these. Many techniques for measuring 
these varied facets of the total personality have been described 
in the foregoing chapters, and problems involved in measure- 
ment have been discussed. In view of the multifaceted na- 
ture of growth, it seems absurd to attempt to evaluate it 
With a single number, letter, or word. Yet the fact is that this 
attempt is made in what is called the “grading” Or “marking” 


System. 
The authors claim no originality in condemning the prac- 
he many factors of growth 


tice of attempting to summarize t 

With a simple number, letter, Or word. Rather, our remarks 

Teflect investigations by experts, critical examination of prac- 
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tice, and the reported experience of many teachers. Practices 
that hold a great deal of promise for the more effective stimu- 
lation of symmetrical pupil growth are already in operation. 
However, we should not be so sanguine as to believe that the 
answer to the question, “What are the best evaluation tech- 
niques?” has been given. As more teachers depart from tradi- 
tional marking practices, better answers to this question will 
be given. Meanwhile, the departures that have been made will 
give teachers some idea of techniques which have been 
gratifying in bringing educational practice into closer accord 
with accepted child-growth theory. 


SOME SHORTCOMINGS OF GRADES 


The purpose of marks and appraisal is, theoretically, tO 
foster pupil growth. They purport to tell something about the 
pupil that will make it easier for him and those who work 
with him to guide his future development. Although this is the 
theory behind grading, there are many practical reasons for 
doubting that it accomplishes this worthy aim. 

1. Marks tend to become the end and aim of education. 
William H. Burton’s statement represents a consensus of pub- 
lic school workers when he asserts that a misconception d 
education is that the Symbols of education are equivalent to 
the outcomes of learning. If you ask a first grader what he 
got out of school on a Particular day, he will say, “I learned 
fo read a story,” “I learned to spell my name,” or “I learned 
to print my name.” If this Same question is asked of a sixth 
grader he is likely to Say, “I got an 80,” or “I got a B.” If the 
college student in general psychology is asked what he got 
from the course, he will Probably indicate that he, too, con- 


* William. H. Burton, The Guidance of Learning Activities, New 
York: Appleton-Century-Crofts, Inc., 1944, pp. 52-59, 
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fuses the symbols of learning with the products of education. 
Learning is a sufficient reward for the first grader, but the 
symbol has become more important to the sophisticate. 

E Marks tend to emphasize subject matter. But among 
aims of elementary education are understanding and practice 
Of cooperative social functioning; opportunity to exercise 
habits of reflective thinking; exercise of individual capacities; 
command of the fundamental processes of reading, writing, 
and arithmetic. (communication); and gaining and keeping 
£ood physical and mental health. Marks tend to emphasize 
only the aim relating to academic accomplishment, which, ad- 
mittedly, is very important; but this academic knowledge and 
skill is simply a tool for helping one to achieve the other aims 
on the list. Concern for marks sometimes results in an em- 
phasis upon subject matter which may actually limit the pos- 
sibility of the pupil’s attaining the other goals. 

3. Marks tend to discourage good teaching. At the risk of 
Oversimplifying, we may define teaching as guiding or en- 
couraging each child to come progressively closer to realizing 
his own potentialities in all aspects of growth. Thus teaching 
involves an intimate knowledge of the children one is teach- 
ing, the development of personal ambition, social orientation, 
Originality (or, at least, uniqueness), and moral and ethical 
values. Teachers who use marks may keep these objec- 
tives in mind; but some teachers employ marks as a threat 
when their teaching methods fail. If the child does not see the 
value of assigned tasks or if he is worried about out-of-school 
Situations, he can still be made to conform by the threat of 
a low mark or failure. Problems of getting to know pupils, 
encouraging growth, and promoting self-realization, need 
Not necessarily be considered when one can use grades as a 
cudgel. i 

4. Marks tend to cause teachers to overlook differences. 
Every teacher is aware of the great individual differences be- 
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tween pupils; yet, all too frequently, the necessity for grading 
causes them to try to bring the slow growers academically up 
to average." We know the futility of this attempt, yet it per- 
sists almost as a compulsion because of the pressure of grades. 
On the other hand, grades encourage mediocrity in brighter 
children because they can get satisfactory marks with the ex- 
penditure of little or no effort. 

5. Marks create a situation that is “unlike life.” It is fre- 
quently argued that grades are a lifelike phenomenon— 
that we are all graded in our commercial, industrial, and pro- 
fessional careers. We are, to some extent, graded in our voca- 
tional lives; but with definite differences. Few of us would 
freely continue to teach, to sell, to run a machine, or to build 
a home if a *big boss" looked over our shoulders each week 
and marked our cards with an A, B, C, or D. We do not need 
such prodding because each of us seeks the inner satisfaction 
of doing a job well. In fact, a great deal of the enjoyment we 
derive from work would be destroyed by a marking system 
patterned after school report cards. Another vast difference 
between school and life is that interpersonal comparisons in 
life are made between people in the same occupation. Typi- 
cally, school marks are based upon the erroneous assump- 
tion that all children are the same—that they should run the 
same course and finish at the same time. 

6. Grades tend to penalize those pupils most in need of 
help. It has been said, with some truth, that a child is most in 
need of love when he is most unlovable. We might say that 
when the child is most in need of encouragement (because a 
task is difficult for him) he is most likely to be discouraged 
by the awarding of a low grade. Frequently, it is the child 
who is working up to (sometimes apparently exceeding) his 
indicated capacity but who is below “grade level” who is fur- 
ther discouraged by a low mark. 


7. Marks have little meaning in themselves. It is a delusion 
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to believe that they indicate to the pupil and his parents how 
the pupil is getting along. The truth is that, as parents, teach- 
ers, and pupils we have become accustomed to a symbol that 
has little meaning, as with men's wearing vests. Study after 
study has shown that the same paper will be graded by differ- 
ent teachers with a different value. Some teachers give a large 
proportion of A's, and others say, “No student can be perfect, 
and A means perfect." Some teachers have in mind academic 
accomplishments alone when they give a mark; others try to 
include such factors as industry, interest, sincerity, and orig- 
inality. Many parents really have no idea how well their child 
is doing in relation to his ability or in relation to other chil- 
dren, but they are pacified by some meaningless jargon ex- 
pressed as a grade. Parents who become accustomed to an im- 
proved form of evaluation assert that they do not see how they 
could have been satisfied with the old working system. 


Contrasts between Marks and Appraisal 
ed above are enough to indicate the 


reason for the present trend away from grades to more in- 
formative methods of evaluation. In fact, because of these 
characteristics of grades, we might even distinguish grading 
from genuine evaluation, as the following contrasts in concept 


indicate. 


The seven items list 


on are supposedly means of com- 


Both grades and evaluati 
munication between teacher and pupil and between teacher 


and parents. But grades are likely to be communication to, 
whereas other forms of evaluation are often communication 
with, 

Grades are ordinarily assigned on an absolute scale which 
places a high value on interpersonal competition and rivalry. 
True evaluation, on the other hand, stresses competition with 
Oneself and places a premium upon the ability of the person 


to cooperate in work and play with others. 
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Subject-matter mastery is the primary emphasis of grades. 
Again, it must be admitted that such an emphasis is worth- 
while, but not to the apparent exclusion of other values. 
Evaluation places subject-matter achievement in the context 
of pupil development. 

Grades are typically given at the end of a period of work 
—at the conclusion of a unit of time, and as such have little 
value in diagnostic and remedial procedures. Evaluation is 
specifically designed to capitalize upon strengths and to rem- 
edy weaknesses. The purpose of grades is to judge the person 
and his work, whereas the purpose of evaluation is to guide 
the person and his work. 

Grades often become the end and aim of learning activ- 
ities, whereas evaluation points the way to more productive 
living and learning. Grades are, at least in part, a concom- 
itant of the policy of blocking out subject matter in pre- 
scribed units, books, and courses. Evaluation is a personal 
matter, and its philosophy implies the use of subject matter 
to achieve the social ends which seem most appropriate to 
the individual. 

These contrasts are, of necessity, generalizations that will 
not always hold true for specific cases. Some teachers may use 
grades in such a way as to approach the values indicated for 
evaluation; and the various means of evaluation may be used 
in such a way that they are no more meaningful than grades. 
This lack of sharp contrast or distinct differentiation between 
grades and evaluation leads us to recommend that systems of 
evaluation be introduced gradually in the school—by taking, 
for example, the first three grades for a “test run," by confer- 
ring with a few parents at the beginning, and by frankly ad- 
mitting that the new system is experimental. But there should 
be no attempt to reconstruct, modify, or alter the traditional 
grading system because modifications can too easily return to 
the former inadequacies. The system should be abandoned be- 


M n 


| — —— € 
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cause of the dangers which seem to be inevitably attached to 
it. 

PROMISING PRACTICES IN EVALUATION 
which follow are not listed in 


bility varies in relation to such 
f the professional 


The "promising practices" 
order of merit, for their applica 
factors as the community, the competence O 
staff, and the intelligence and grade level of the pupils. 

1. Letters to parents. In place of the report card with boxes 
containing marks after *reading," “arithmetic,” “deportment,” 
and the like, some teachers are writing letters to the parents to 


facilitate communication between home, school, and child. 
Sometimes these letters are completely informal, indicating 
only those factors that seem to be most distinctive concerning 
the particular child. Sometimes the letters are accompanied 
by an outline which includes such items as intellectual growth, 
emotional control, social development, unique weaknesses, 
and outstanding gifts and qualities. These letters need not be 
sent on definite dates. A guiding schedule may be worked out 
so that three or four letters are sent each week, but in the 
event of need a letter may be sent well ahead of schedule. In 
the course of a year, three letters or half-a-dozen letters may 
be sent regarding one child, whereas one or two will suffice 
for another. In fact, one of the specific merits of this plan is 


its flexibility. 
2. Home visits. An excellent means of communication be- 
for the teacher to make calls at 


tween teachers and parents is 
the child’s home. There are, however, some hazards which 
must be evaluated if the plan is to succeed. Some persons who 


live in homes which they wish were considerably better feel 
embarrassed by the teacher's visit. Hence, home visits should 
be approved by the parent before the teacher calls. For ex- 

2 This is the author's opinion, 


and reflection. It is our hope th 
lead to some fruitful conclusions. 


on considerable study 


but it is based 
f the contention may 


at discussion O: 
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ample, the teacher may offer to call at a specific time, but 
P acre EES inconvenient, we shall be glad to have you visit 
us at school on [a definite date]." Many parents are more at 
ease in their everyday Surroundings than in the relatively 
strange atmosphere of the school. 

A. good deal of the objection to home visits comes from 
teachers who are somewhat reluctant to make calls. It may 
not be an easy thing for some teachers at first; but those who 
have become accustomed to it have frequently asserted that 
they could never return to another kind of reporting practice 
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Should be made only if it is absolutely necessary. The teacher 
Should not attempt to solve problems on the first visit but 
rather to open the way to further cooperative study of the pupil. 

3. Teacher-parent conferences in the school. Conferences 
at school have essentially the same purposes as home visits. 
The particular advantage of the school visit is that the parent 
can be shown some of the child's work and the objectives of 
School activities can be more clearly explained. Test data may 
be examined and interpreted with greater exactness when it is 
deemed advisable to reveal the information. Some parents 
may prefer visiting the school to having their living conditions 
or their immediate neighborhood revealed. School visits do not 
require so much of the teacher’s time, but they should never- 
theless be a scheduled activity and a responsibility of the ad- 


Ministration as well as the teacher. 


Many of the suggestions for m 
apply also to teacher-parent conferences. Additional sugges- 


tions are as follows: Do as much listening as possible con- 
sistent with not permitting the time to drag. Maintain a facial 
expression of cheer and confidence; this may seem a super- 
ficial suggestion, but it is fundamental to the success of the 
conference, Avoid any indication of shock at outmoded con- 
cepts, rough language, Or questions regarding morality. Keep 
the number of criticisms to a minimum. Give only one or two 
constructive suggestions at an interview; too much advice, 
even when it is good, is likely to overwhelm. If one of the 
Parents is criticized by the other, do not take sides on the is- 
Sue even if one is clearly wrong. 

4. Self-appraisal. Self-appraisal is a difficult thing. Some of 
us are too critical of ourselves, and others are too lenient in 
self-evaluations. Nevertheless, the development of this skill 
is an exceedingly important educational objective, since it is a 
major factor in the individual's vocational success. Practice 
may be begun in the first grade. The naturalness of self-evalu- 


aking home visits successful 
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ation is reflected in such statements by children as *This is 
good” or “This is no good" in referring to blocks they have 
piled up, pictures they have drawn, or games they are playing. 
Unfortunately, this tendency is curbed by grades and marks, 
which make the child dependent upon the teacher's evaluation 
to the extent that the one criterion of success becomes ac- 
ceptability to the teacher. Teachers can help promote self- 
appraisal by commending effort and its products or by asking 
the pupil if the work could be improved. If the teacher feels 
that the youngster is wrong in his appraisal, he should not try 
to impose an evaluation. Skill in self-appraisal, like other 
human traits, is the result of growth and development. 
Self-appraisal is not an ability that flourishes when exer- 
cised at six-week intervals; it requires daily practice. The 
teacher should record the pupil's oral efforts at evaluation and 
encourage the pupil to compare his present efforts with his 
past work. Group discussion, even in the primary grades, can 
help children achieve better self-evaluation. Classmates' praise 


or censure of the child's conduct and work stimulates him to 
make his own evaluation. As 


grades, it is advisable that 
The child may write a lett 
ing his evaluation of his p 
functions in the classroom. 


children progress through the 
some of the evaluations be written. 
er to the teacher or parent regard- 
erformance in social and academic 


"Faith Pascal, “When the Child Makes His Own Report,” Records 


and Reports, Bulletin 77, Washington: Association for Childhood 
Education International, 1942, p. 29, 
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Self-appraisal obviously cannot replace other forms of 
evaluation. As a supplementary device, however, it is worthy 
of careful trial because it contributes so much to the imple- 
mentation of democratic theory. 

5. Teacher-pupil conferences. Conferences between pupil 
and teacher are really an aspect of pupil self-evaluation, with 
a difference in emphasis. In pupil self-evaluation the teacher's 
role of adviser is held to a minimum and the relationship re- 
sembles client-centered counseling. Teacher-pupil evaluation 
is based on the belief that there is value in a give-and-take 
relationship. If the teacher has something critical to say, he 
will say it—always, of course, with the view in mind of pro- 
r-pupil evaluation gives open 


moting pupil growth. Teache 
y of the teacher for positive 


recognition to the responsibilit 


leadership. 
In a way, this technique falls short of pupil self-appraisal, 


but it is a long step beyond the teacher’s “giving” a grade. It 
brings the pupil more directly into the evaluation process, 
which is an integral part of the learning process, than do the 
traditional practices. 

6. Teacher-pupil-parent conferences. Our discussions of 
parent conferences and pupil self-appraisal have anticipated 
consideration of this technique. The value of home-school con- 
tacts is widely recognized, but quite frequently this recognition 
seems to ignore the child, or at least to treat him as if he were 
a disinterested part of the entire procedure. There are prob- 
ably times when it is advisable for the child to be kept unin- 
formed about some matters bearing on his welfare. It is 
probable, however, that these situations occur less frequently 


than many anxious teachers and parents seem to believe. If 
mmunication, then it seems logical 


be intimately involved. 
-way conference is that the 


duced. Certainly, there is 


reporting is a manner of co 
to admit that the pupil must 

The clear advantage of the three 
likelihood of misunderstanding is re 
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much room for misunderstanding when percentage grades and 
letter tags are used. It has been shown that there is relatively 
little consistency in the use of grades among teachers, and 
their meaning is likely to vary still more among those who 
deal with them less frequently. But if there is a lack of under- 
standing between persons in a conference, questions will stim- 
ulate an answer that might lead to clarification. 

The observations of both parents and teachers leave little 
doubt that the three-way conference increases understanding 
of the pupil. The parent knows his child better for seeing him 
in action in another part of his environment. The teacher 
knows the pupil better for seeing him in contact with the other 
adults who so greatly influence his life. 

Some of the hazards and shortcomings of this method of 
appraisal are the following: The negative attitude of the 
teacher who feels that it is an imposition on his time to con- 
duct these conferences is a very real obstacle to the success 
of this method. Also, holding conferences may cause teachers 
to neglect other means of evaluation, such as cumulative 
records and personnel cards, because no record is kept of the 
interview, although records are an important responsibility of 
the school. The technique will be increasingly difficult to use 
as the pupils progress through the grades, because departmen- 
talization of instruction puts the pupil into contact with more 
and more teachers who know him less and less intimately. 
This defect is, however, no greater than the hazard of giving 
a child a grade on the basis of superficial knowledge about 
him. The teacher-pupil-parent conference is not a panacea 
for the problems of evaluation in education. It is another 
means of communication; but examinations, inventories, pro- 
jective techniques, cumulative records, and staff conferences 
are also a part of the process of evaluation. 

7. Cumulative records. The teacher’s conviction that the 
child should be accepted for what he is should not blind him 
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to the fact that it is informative to know how the child 
achieved his present status. Quite frequently, considerable 
concern about the growth and status of a child would be al- 
layed if the teacher could but picture clearly what the child 


was a few months earlier. When one sees a youngster every 
the minute increments of growth 


aging total. The cumulative record 
progress more clearly. 
e a nationwide standard 


day, it is easy to overlook 
which add up to an encour 
can help teachers to see the child's 

It would seem desirable to hav 
cumulative-record card or folder, or at least a uniform card 
for use within each state. Such a card would facilitate the un- 
derstanding of a pupil as he transfers from one school to an- 
other. Since we do not have standard cards, either nationally 
or within states, it is feasible for each school system to work 
out its own card, thus ensuring the recording of those data 
which are most important for the particular school system. 
The selection of the type of data to be recorded is not an easy 
task. One suggested guiding principle for making a cumulative 
record is to keep the information to a minimum, for a mass 
of data is discouraging to the teacher. System and regularity 
in keeping the record will make up for some deficiencies in 
the amount of information noted. With these observations as a 
starting point, it is recommended that the record include the 


following items: 


Personal Data. Name, Sex; birthplace, date of birth, father’s 


name, nationality, occupation. mother's name, nationality, 0c- 
cupation, family status (married, divorced, etc.), siblings (sex 
and age), and language spoken in the home. 

Chronology and Address. Several lines will be needed for 
changes of school and address: date entered, class and grade, 
name of school, home address, phone, and significant remarks 


about the home. 
he outcomes of conferences, 


Conference Notes. Notes on t c ) 
with the date and grade status of the child at the time of the 
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meeting, should be a part of the record. Care must be taken 
to avoid the temptation to record anything that is unneces- 
sarily derogatory about the home, the child, or the parents; 
for example, “We have asked the Kiwanis to give Don a pair 
of shoes” is preferable to “Mr. B. has made no effort to see 
that Don is properly shod.” Or, “Whenever possible, we should 
keep Don after school and let him work at the projects he 
likes so well" is better than “Don’s choice of after-school com- 
panions is consistently bad.” : 

Record of Attendance. This should include terms, dates, 
punctuality, and school progress. 

Achievement-test Data. These must be complete to be 
meaningful. They might include date, grade, name of test, 
subject, form, and grade placement or other standard results 
such as percentile rank and standard score, 

Intelligence-test Data. Date, grade, name of test, form, 
chronological age, MA, IQ, examiner (if an individual test), 
and other standard results should be included. 

Significant Behavior or Personal 
care must be taken to avoid 
The purpose of the whole 
tion is to help the child. It 
tive data that might prejudi 
purpose. 

Anecdotal Reports. These 
previous heading, 
Space a separate h 

In some cumul 


ity Observations. Again, 
unnecessarily derogatory remarks. 
field of measurement and evalua- 
is doubtful if a recording of negas 
ce the next observer will serve this 
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Experimenting with Evaluation Techniques 


It is all too evident that no single perfect method of evalua- 
tion has yet been devised. It is therefore recommended that 
each school have a committee to deal with evaluation prob- 
lems, experimenting with techniques devised by members of 
the staff or adapting techniques in use elsewhere. Evaluation 
so intimately affects the entire operation of the school— 
curriculum, methods, promotion policy, philosophy—that it 
constitutes an effective focal point for critical examination of 
the entire school. 

Many lists describing the effective teacher have been formu- 
lated, but one criterion is always present in some form: the 
good teacher is learning, growing, or progressing. Time spent 
on local problems of evaluation will be fruitful from the stand- 
point of teacher growth. Incidentally, it should be mentioned 
that the success of any of the techniques mentioned in the 
foregoing section (letters, conferences, cumulative records, 
student self-appraisal) will depend first of all upon the teacher's 
acceptance of the idea. Improvements in the techniques will 
also depend upon the teacher's acceptance and understanding. 
William L. Wrinkle, after a careful study of many practices 


in evaluation, concludes:* 


Perhaps no final bit of advice would be more appropriate . . . 
than . . . the following statement made by Franklin D. Roose- 
velt in his 1932 Baltimore address: “Do something; and when you 
have done something, if it works, do it some more; and if it doesn't 
work, do something else.” There is one very happy aspect involved 
in attempting to bring about improvement in marking and report- 
ing practices—whatever you may do has little likelihood of being 
more objectionable or less adequate than the practice it replaces. 


“William L. Wrinkle, Improving Marking and Reporting Practices, 
New York: Rinehart & Company, Inc., 1947, p. 115. 
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If we reflect upon the purposes of the school and upon the 
purposes of evaluation, we immediately remember that edu- 
cation is a common enterprise involving the home, the com- 
munity, the child, the teacher, and the administrator. In con- 
sidering ways to improve evaluation procedures, it is heartily 
recommended that a committee or informal group be called 
together to discuss some of the problems. This group should 
consist of some parents who have evidenced interest in the 
school, some citizens who are willing to devote some of their 
time to the problem, a student or two who have the ability to 
speak with clarity (they need not necessarily be the brightest 
in the class), some teachers who can resist the temptation to 
dominate a discussion of educational problems, and an ad- 
ministrator. This group can consider the purposes, methods, 
and tools of evaluation. If changes in the system are war- 
ranted, it will be helpful to have a group of parents, citizens, 
and children who will serve as a vanguard in the job of in- 
terpreting the changes to the community. 

The technique of involving parents, citizens, and pupils in 
a consideration of school problems has been tried in many 
localities, and the consensus of School workers who have 
evaluated such groups is that they are indispensable. Some- 
times good practices have failed because of inadequate in- 
terpretation to the public. When a committee is called to- 


gether, the urgency of this phase of forward movement is made 
So apparent that it cannot be overlooked. 


ELEMENTS OF GOOD APPRAISAL PRACTICE 


It has been shown that present methods of appraising pupil 
development and progress in the School are open to serious 
criticisms, and as yet, no universally acceptable substitute for 
the questionable practices has been devised. Marked improve- 
ments are possible, however, by means of techniques now 
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being developed. The application of these techniques has led 
to the formulation of the following basic principles of evalua- 


tion: 


1. 


The object of appraisal pr 
teacher and pupil or between 
praisal relates to all phases of a pup 
if not impossible, to find a 
cause appraisal is so intimate 


. The purpose of evaluation is to promote O 


. Evaluation should ind 


. Appraisal should be in terms 0 


. Evaluation should be in te 


. Objective data are necessary, 


. Alterations of evalua 


mmunication between 


Pupil evaluation is a means of co 
e. As such, it must be 


the school, the child, and the hom 
meaningful to all concerned. 
ptimum 


growth. An indication of status is not enough. 
icate what steps should be taken 


next. A statement of desired behavior is an inherent re- 


sponsibility of all the evaluators. 
f individual accomplish- 


ment and not in terms of interpersonal comparisons. 

rms of the stated objectives of 
he school level concerned. The demands 
rofessors, registrars, or €m- 
e entire evaluation practice 


education for t 
of such groups as college p 
ployers should not shape thi 
at any level. 

Evaluation should be a continuing pro 


end in itself at any point in the pupil’s growth. 
but these data are always 


cess. It is not an 


dynamic person. 

tion procedures involve the entire 
philosophy of the school and must therefore be a matter 
for serious study. Change should take place only as 
rapidly as those concerned are convinced of its value. 


relative to the living, 


SUMMARY 


actices is communication between 
teacher and home. Because ap- 
il's growth, it is difficult, 
n denominator. But be- 


commo. 
pil growth, 


ly connected with pu 
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the study of evaluation practices stimulates examination of 
education as a whole—its philosophy, methods, curricula, and 
materials. 

Grades and marks are open to criticism because of such 
factors as the following: Marks tend to become the purpose 
of education for the pupil; he works for the grade. Marks tend 
to stress subject matter as a primary aim, whereas pupil de- 
velopment or pupil self-realization is the transcending aim of 
education. Marks tend to become a cudgel for the inept 
teacher, who uses them for incentive instead of establishing 
more persistent motives. Grades tend to force all youngsters 
to progress at the same speed and to conform to one mold. 
They are a threat to the maximum enjoyment of school, since 
slow pupils persistently tend to get low marks and particularly 
able pupils are likely to get good marks without learning the 
valuable habit of rigorous application. Actually, marks have 
little meaning because of the different values teachers assign 
to them and because those Who look at the marks "read" 
them differently. 

There are a number of contrasts between grading and other 


appraisal practices—contrasts that indicate the tendencies of 
the two practices, Some o 


With versus communication to; co 


teachers and parents endorse this kind of contact. Teacher- 
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parent conferences in the school have such inherent advantages 
as economy of the teacher's time, availability of cumulative 
data, and introduction of parents to the materials and meth- 
ods with which the pupil and the teacher are familiar. Pupil 
self-appraisal should be an objective of all evaluation, and 
pupils must be given specific opportunity to practice self-ap- 
praisal. Teacher-parent-pupil conferences combine many of 
the advantages listed above and have the additional advantage 
of consciously bringing the pupil onto the scene. These various 
means of communication do not erase the need for cumulative 
records, which permit communication between various per- 
sons related to the pupil at successive periods of time. 

No universally acceptable appraisal practice has yet been 
devised. Each local school system must plan its own most 
effective evaluation procedures. This is arduous work involv- 
ing the coordinated efforts of teachers, pupils, parents, citi- 
zens, and administrators. The effort will be fruitful, however, 
not only because appraisal facilitates pupil growth but be- 
cause the improved communication will result in better gen- 


eral educational practice. 


STUDY AND DISCUSSION EXERCISES 

“We'd like to change, but the parents 
y having a team of stu- 
w ten teachers and ten 
nce or repudiation of 


1. Teachers often say, 
Will not let us." Evaluate this statement b 
dents who are studying this book intervie 
Parents. Contrast the degree of accepta 


change for each group. 
2. Do you agree with all the so-called shortcomings of grades 


that are listed in the chapter? Can you think of any other objec- 


tions or advantages that have not been mentioned? 
3. Has your own experience been one of satisfaction or dissatis- 


faction with grades? 

4. Divide the class into groups and have each group draw up 
a list of ways of implementing each of the suggestions for improv- 
ing appraisal practices. Bring the list to class for criticism and 
further suggestions. 
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5. Compare cumulative record cards or folders from several 
schools as to merits and shortcomings. 

6. Consult some recent educational periodicals to see if there 
are any recent reports on the value of newer appraisal practices. 


SUGGESTED ADDITIONAL READINGS 


Association for Childhood Education International: Records and 

Reports, Bulletin 77, Washington: The Association, 1942, 32 

PP ifferent phases of the problem of evaluation and reporting 
are discussed in this pamphlet by school workers with practical 
experience. Various views of pupils, parents, and teachers are 
represented. 

Elsbree, Willard S.: Pupil Progress in the Elementary School, 

New York: Teachers College, Columbia University, Bureau of 

Publications, 1943, 86 pp. . 
The last two chapters in this booklet deal in scholarly detail 
with contemporary trends in the marking system and reporting 
to parents. The author's list of trends in reporting indicates some 
of the things one needs to include in his thinking about 
evaluation. 

Smith, Eugene R., Ralph W. Tyler, et al.: Appraising and Record- 

ing Student Progress, New York: Harper & Brothers, 1942, 

550 pp. 
This is volume III of the series "Adventure in American Educa- 
tion," which deals with the widely known “Eight-year Study" 
or "Thirty-school Experiment." This book describes how eval- 
uation was carried on in Such areas as thinking, appreciation. 


personal and social adjustment, and interests, 


Wrinkle, William L.: Improving Marking and Reporting Practices 
in Elementary and Seci 


ondary Schools, New York: Rinehart & 
Company, Inc., 1947, 120 pp. 


This book is based on ten years of ex 
evaluation practices, The interdepende 
ing, and educational practice and t 
specific suggestions are made for t 
traditional practice, but the author m 
discovered a panacea. 


perimenting with better 
nee of appraisal, report- 
heory is recognized, and 
entative departures from 
akes no pretense of having 


CHAPTER FOURTEEN 


Toward a Planned Program of 


Evaluation 


ok has been devoted to a phase of the 
he view presented in this book is that 


avior which help the teacher to get a 
itate his future devel- 
f tests have been de- 


Each chapter of this bo 
total testing program. T 
tests are samples of beh 
better view of the pupil in order to facil 
opment. Both the uses and limitations o 
Scribed in order that teachers may capitalize fully upon the 
values each test possesses. The specific problems involved in 
testing ability, estimating achievement, appraising personality, 
and evaluating classroom status have been discussed with a 
view to helping the teacher see how each device may be made 
to contribute to pupil development. 

It has been necessary in this book to expand in later chap- 
ters concepts that were mentioned in earlier ones. By this pro- 
cedure the concepts and skills needed in effectively utilizing 
tests have been presented by means of a spiral development. 
In this final chapter these concepts will be exemplified in a 
proposed testing program. This suggested program will pro- 
vide a point of departure for those who would like specific 
advice regarding the development of their program. 

243 
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This program is regarded as a minimum 7 without ai 
basic tests effective instruction will be unnecessarily as 
Although it is possible to do too much testing, it is advisable 
to use more tests than are included in this suggested program. 
Test results, as we have seen, are to be regarded as supplemen- 
tary and corroborative data which are more likely ie di 
their purpose when they are correlated with information fro: " 
other sources: teacher Observation, cumulative records, an 
past school performance. 


Determining the Objectives 


In Chapter 3, "Choosing the Right Test," it was indicated 
that tests can be used for a variety of purposes. The superin- 
tendent or principal may wish to know more about the ap- 
proximate level of ability of pupils in the school system, and 
the extent to which pupils are capitalizing on that ability in 
terms of academic achievement. The supervisor may wish to 
use standardized tests to evaluate the effectiveness of instruc- 
tion in the area of his jurisdiction. A one-session test will yield 
only a minimum of information; better results will be obtained 
when successive tests are used to give data regarding develop- 
mental trends. Thus it is clear that planning is necessary in 
even a simple situation. 

We have seen that test data can be used to make instruc- 
tion more effective. In order that these objectives be clearly 


stated and adequately understood, it is desirable that teachers 
participate in the meetin 


tion and clarification. 
Achievement of the objectives of 


depend not only upon (1) the teache 
jectives, uses, and limitations of tests 


the testing program will 
ts’ understanding the ob- 
and (2) their conviction 
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that some educational values will result, but upon the opera- 
tion of the program. How well the program operates will de- 
pend upon (3) the choice of the most appropriate tests (see 
Chapter 3) and upon (4) a workable system for maintaining 
the results—a simple cumulative record. Unless the entire staff 
has participated in the formulation of objectives and plans for 
operating the program, it is desirable that (5) some training 
Sessions be devoted to the administration, scoring, and in- 


terpretation of results. 


Reading Readiness 

ere has as its ob- 
than 
help- 


The minimal testing program suggested hi 
jective helping the teacher understand the pupils rather 
revealing to the administrator the status of the pupils or 
ing the supervisor evaluate instruction. 

In the first grade, the information most vital to the teacher 
is whether or not the pupil is adequately prepared to begin 
his study of reading. The best test for this purpose is one de- 
Signed to estimate readiness rather than to measure intelli- 
gence, There are several reasons for this. Group intelligence 
tests at best yield only approximations of intelligence, and 
When given in the lower grades are even less dependable. 
First-grade pupils are too small to grasp the importance of 
their task and too lively for prolonged periods of concentra- 
tion, and successive scores on individual tests show that this 
type of intelligence test is more dependable for older pupils 
than for preschool and primary pupils. Hence it is probably 
Wise to delay giving group intelligence tests until the pupil 
has become accustomed to his new environment and can give 
à more accurate account of himself. Further, it may do the 
youngster some harm, in view of the general failure to under- 
stand the meaning of test scores, tO record in his cumulative 
record a score that might later cause some teachers to mis- 


judge his ability. 
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Reading-readiness tests are subject to the same Meer = 
unreliability as group intelligence tests; but as applie 
readiness tests the criticism is less serious. The results enn 
temporary use only and will not be considered a. a 
the passage of a few months. In addition, the reading-rea an 
test has some value in diagnosis as well as in determination 
of status. That is, although it is informative to know that a 
child has a certain mental age—mental age is an important 
factor in readiness—it is possible that what appears to be ade- 
quate mentality may for a given child be composed of ein 
that are less directly related to reading ability than are the fac 
tors that result in the same mental age for another child. The 
readiness test, on the other hand, samples those abilities that 
are most directly related to the task at hand—learning to read. 
If the test is divided into parts, the relative standing of the 
child on the various parts may suggest the most suitable pee 
gram for developing readiness in a particular child. For 1n- 
stance, the Reading-aptitude Test devised by Marion Monroe 
has sections devoted to the evaluation of visual, auditory, and 
motor functions as well as articulation, use, and comprehen- 
sion of language. A low score on any one part suggests what 
may be done to promote the development of readiness. 

Readiness tests should be given in the first grade, during the 
first three weeks of School. The teacher will then be able to 
offer those pupils who are ready for reading the experiences 
they are anticipating and to delay reading for those who, if 


started too early, would suffer the disappointment and frustra- 
tion of failure, 


Physical Examinations 


The results of the Teadiness test may indicate the need for 
a physical examination to Teveal remediable visual or auditory 
defects. The routine examination of tonsils, teeth, eyes, and 
ears is not enough. Physical status is as variable as intelli- 
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gence-test results at this stage of development; hence, in addi- 
tion to the periodic testing of children by audiometrists, physi- 
cians, ophthalmologists, and school nurses, the teacher must 
be persistently alert to the symptoms of visual difficulty, audi- 
tory difficulty, and acute and chronic infection. Representa- 
tive of the telltale symptoms for visual difficulty are squint- 
ing, excessive blinking, twisting the head when looking at the 
chalk board, watering of the eyes, sties and granulated eyelids, 
attempts to brush material off the printed page, and bending 
abnormally close to a book. Common symptoms of auditory 
difficulty are turning the side of the head toward the source 
of sound, inattentiveness, boredom, cupping the hand behind 
the ear, ignoring simple requests and questions, complaints 
of buzzing in the ear or of earaches, speech defects and odd 
voice quality, and sometimes seclusiveness and poor school- 
work. Indications of acute or chronic infections may include 
many of the above symptoms, such as listlessness and inatten- 
tiveness, as well as frequent absences from school, drowsiness, 
lack of interest in play and schoolwork, and irritability. 

These symptoms, like the scores on standardized tests, must 
be taken as informative data to be supplemented by cor- 
roborative evidence. The teacher does not diagnose on the 
basis of symptoms, but his awareness of them will make for 
earlier and more frequent referral of pupils who might profit 
from special medical attention. 


Diagnostic Reading Tests 


In the second and third grades, the major academic concern 
of the pupil and teacher is reading. Some children may not 
yet have accomplished the development that would indicate 
à general readiness for reading. Tests may show that others 
are psychologically ready, but actual performance may reveal 
achievement which falls short of their indicated ability. In 
Order to save these pupils from the trauma of repeated and 
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continued failure, it will be well to discover what their specific 
difficulties are. Early correction of remediable difficulties will 
prevent the development of the "reading block" so frequently 
referred to. Blocks against reading are, for the most part, dis- 
like generated by chronic failure or a strong conviction that 
one just cannot learn to read. 

Diagnostic tests will help the teacher to determine whether 
the pupil is having difficulty in one or more of the following 
areas of specific reading factors: recognition of visual like- 
nesses and differences in printed phrases, ability to analyze 
words, recognition and understanding of spoken words, ade- 
quacy of reading vocabulary, interpretation of the message 
contained in sentences and paragraphs, understanding of fac- 
tual data, method for attacking new words, and skill in the 
use of tables of contents (a minor concern in the primary 
grades but of increasing importance at the upper grade levels). 

The diagnostic reading test may be given during the first 
two or three weeks of school and used as a source of informa- 
tion for work with children as individuals and in groups. All 
pupils may profit from wise use of the results of the diagnostic 
test. The information may suggest ways to help the able stu- 


dent make even better Progress and thus provide motivation 
for continued development. 


Group Intelligence Tests 


As the teacher diagnoses reading difficulty, he thinks im- 
mediately of the mental ability of the pupil. It is possible that 
mental testing should take precedence over diagnostic testing 
in the second and third grades. If both cannot be done, it will 
be up to the teacher or the testing committee to decide which 
Will be of greater immediate value. 


* Teachers selecting tests might keep these points in mind as they 


tead the publishers’ manuals, catalogues, and reviews of diagnostic 
reading tests. 
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Although the evaluation of general mental ability may not 
be the factor of prime importance in the primary grades, it 
becomes of greater interest in the intermediate grades. Hence 
it is recommended that the testing program include a series of 
group intelligence tests, beginning in the third grade. Some 
schools administer these tests in the third, fifth, and seventh 
grades. However, since group intelligence tests are of ques- 
tionable accuracy and the rate of mental development is still 
variable in the elementary school years, it seems desirable 
to test in each grade if it is financially feasible. 

The tests should be given during the fifth or sixth weeks of 
school rather than immediately. The pupils should be allowed 
time to settle down after the vigorous activities of their vaca- 
tion, and new pupils should be given time to acquaint them- 
selves with their new human and physical surroundings. 
(Pupils who enter during the school year should be given an 
intelligence test after they have had time to become ac- 
quainted.) The middle of the week will probably be the best 
time, but the test period should not coincide with a fall fes- 
tival, school party, or athletic contest. 

As we saw in Chapter 1, the teacher should respect his own 
Skepticism if test results do not accord with his observation of 
the pupil. In the event that a score seems too low for a partic- 
ular pupil, he should give an equivalent form of the test or, if a 
Psychometrist is available, an individual test. Whether the 
Score has surprised the teacher or not, it will be well to com- 
pare the pupil's present score with records from his previous 
School years. 

General-achievement Tests 

By the time the child has reached the fourth grade, he 
Should be beginning to acquire the informational data he 
will need to live effectively. His interest should begin to shift 
from reading, writing, and computation to such subject-matter 
fields as geography, language, spelling, and social studies. In 
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the intermediate grades, interest shifts from the acquiring of 
the tools of learning to practice in their use. However, this 
does not imply that all pupils have learned the skills so thor- 
oughly that the “fundamental processes” can now be neglected. 
As we shall see in the next section, there should be continued 
emphasis on improvement of the skills throughout the ele- 
mentary and secondary school years. 

Effective use of achievement tests by teachers requires that 
data be available regarding pupils’ ability. Intelligence tests 
give an indication of the pupils’ present intellectual status, 
whereas achievement tests give evidence of how effectively 
the pupils are using their ability. However, as was indicated 
in Chapter 6, “Evaluating Pupil Achievement,” high ability 
does not mean that the pupil should necessarily achieve at a 
high level; health, home factors, personality and social prob- 
lems, past experiences, and the number of current out-of- 
School activities must be considered in interpreting the data. 

In addition to indicating to the teacher whether the pupil’s 
achievement corresponds to his indicated capacity, achieve- 
Ment tests help to evaluate the effectiveness of instruction. If 
the average ability of all Pupils is near the national norm and 
achievement in language and reading is also close to the 
national norm while achievement in arithmetic and spelling is 
below average, it is possible that techniques of instruction in 
these two subjects should be examined. Individual teachers 
may find areas which they think, in terms of class averages, 
will need particular emphasis throughout the year. However, 
a class average above the norm in language does not neces- 
sarily indicate superiority of teaching; it may simply be a mat- 
ter of the school’s being in a Superior neighborhood. Thus 
interpretation of data is essential, 

If only one battery of achievement tests can be given per 
year, October or November is probably the best time. Pupils 
have had an Opportunity to settle down and to be reminded 
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of some of the things they may have forgotten during vacation, 
and there are few holidays to interfere with characteristic emo- 
tional stability. If the tests are given at this time, the teacher 
has had a chance to evaluate the pupils but still has enough 
of the school year remaining to benefit from the guidance of 
the test data. It would be highly desirable if an equivalent 
form of the test could be given in the spring to provide a basis 
for the student's evaluating his own progress and to give the 
teacher a chance to judge the effectiveness of his teaching. 


Upper-grade Reading Tests 


Most teachers appreciate the fallacy of the saying, *Prac- 
tice makes perfect." It is much closer to the truth to say that 
one learns by doing. The quality of one's reading probably 
tends to improve if he does a great deal of reading. However, 
experimental investigation of reading also reveals that when 
one reads extensively he may simply fix more firmly the habits 
he has already developed. Maximum improvement will come 
With directed, correct, and purposeful practice. The impor- 
tance of continued reading instruction in the intermediate and 
upper grades is emphasized by the fact that at the age of 
about twelve or thirteen, interest in reading reaches its highest 
point. Furthermore, it is at this age that interests shift from 
the juvenile to material which is of interest to adults. Data 
from educational psychology indicate that the better the skills 
are taught at this crucial period, the greater the likelihood that 
interest in reading will continue at a high level. 

Time should be regularly scheduled for the development 
of silent-reading skills. A silent-reading test will indicate the 
areas that are in need of particular attention and provide a 
Strong source of motivation for steady application, which is 
perhaps still more important. Some of the factors which a 
Silent-reading test might evaluate are comprehension of para- 
graph meaning; appreciation of the organization of ideas— 
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key words and phrases, ability to locate information; skill in 
using indexes, tables of contents, references, etc.; and rate of 
reading. Many teachers have experimented with giving one 
form of a silent-reading test at the beginning of a six- to 
twelve-week period of special instruction and the equivalent 
form after the planned exercises have been completed. These 
experiments have been uniformly highly gratifying; students 
have shown gains as high as 50 to 100 per cent, often with 
average gains of 50 per cent in rate and comprehension of 
reading. (It should be noted, however, that unless there is 
some continuing emphasis on the elements of good reading, 
the pupil will tend to regress toward the level of his former 
reading habits.) Individual differences in ability and motiva- 
tion will also influence the variation in achievement. 

The importance of including silent-reading tests in the mini- 
mal testing program is indicated in the following passage:* 


One of the most important of the modern advances in teaching 
methods is the tendency to force elementary school and high school 
Students to read widely in many fields. Instead of confining the 
students' reading to a few textbooks relating to a limited number 
of topics, the progressive School provides for and demands a wide 
range of reading activity. Furthermore, the solution of most class- 
room problems in the modern school requires the skillful use of 
books as sources of information. In this sense, reading comes to 
ely rapid comprehension of printed 
organizations of materials read. It 


Sources of information. 


Evaluation of Personal and Social Adjustment 


In our discussion of the uses and abuses of tests and in- 
ventories of personality in Chapter 8, 
°H. A. Greene, A. N. 
Silent Reading Tests: 
Company, 1931, p. 1, 


"Appraising Person- 


- Jorgensen, and V. H. Kelley, Manual, Iowa 
Advanced Test, Yonkers, N.Y.; World Book 
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ality," we saw that the defects of these instruments may some- 
times outweigh their possible advantages. However, since per- 
sonality and social factors are of major importance in the 
classroom, the cautious use of projective techniques and 
sociometry (see Chapter 9) may offer the teacher some 
help in handling personality problems. 

Ink-blot, cloud, and picture-interpretation tests as means 
of evaluating personality should be used only by those who 
are specially trained to interpret them. Even in the hands of 
experts, these tests reveal both the strength and weakness of 
projective techniques; ie. the subject puts himself into the 
test and the examiner projects himself into the interpretation 
of results. Some projective techniques can be of value if the 
teacher bears in mind this tendency of the person who ad- 
ministers and interprets the test to make unique inferences. 
Observation of children at play is recommended, not with the 
aim of policing but to see how the child orients himself to 
Others, what his view of self is, and what his abilities are. The 
writing of themes, stories, and compositions is recommended; 
recurrent emphases or ideas, when corroborated by other 
sources of information, can give teachers clues, not data, on 
personal and social adjustment. Drawing, painting, and finger- 
painting also may afford some clues as to the child's emotional 
patterns. The teacher should remember, however, that special 
training is required for adequate interpretation, although he 
may gain a deeper understanding of the pupil through the 
cautious study of his creative products. 

Something of the value of sociometry is revealed by the very 
common remark of teachers, "I was surprised by the dif- 
ference between what I thought were the interpersonal likes 
and dislikes and what was indicated by the sociogram." These 
new insights can be of great help to teachers in developing 
Seating and working arrangements for pupils in the classroom 
group. The teacher must remember that, as we saw in Chapter 
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9, the interpersonal constellations will shift with the passage 
of time and with changes in the situation. Therefore socio- 
metric designs should be redrawn as the occasion demands. 


Subject-matter Tests 


Tests that possess some of the advantages suggested in the 
section dealing with silent-reading tests exist for other areas 
as well. There are arithmetic tests which give some indication 
of specific areas of strength or weakness, i.e., addition, sub- 
traction, division, multiplication, or particular number com- 
binations such as the misapprehension that six times seven is 
forty-four. English-usage tests are available which yield sim- 
ilar diagnostic information. By means of such tests, much 
time can be saved by avoiding repetitious general drill when 
a small amount of drill on a specific detail would suffice. 

There are many tests in the social studies, and their com- 
position varies with the purposes of the test constructors. 
Sometimes the emphasis is upon the mastery of information; 
since facts are the basis of sound thinking, this is a justifiable 
emphasis. Other test makers, however, place primary stress 
upon the use and interpretation of data and upon techniques 
for acquiring information. The individual or committee re- 
sponsible for test selection will have to determine which kind 
of test will best fit the objectives that have been stated for the 
particular school concerned. 

It is probably best to administer subject-matter tests near 
the beginning of the school year, though the time will depend 
upon the specific purposes of the test. In addition toa planned 
Succession of tests on a Schedule, it should be possible to test 
at irregular times transfer pupils and pupils who were absent 
at the time of regular testing. If data are not available on à 
child when needed, teachers may become discouraged about 


the testing program and abandon the advantages that could 
accrue from a dependable program. 
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A Check List for the Testing Program 


A number of considerations are involved in obtaining the 
best results from a testing program. The following check list 
will provide guidance in determining responsibilities and 
duties and anticipating difficulties: 


1. Purposes of the program Check 
Clearly defined 
Understood by parties involved_------------------- —— 

2. Choice of tests 
Validate morire cn RES inn 
Pelisbla. Joanne comm — 
Appropriate difficulty level------------------------ a 
Adequate norms_--------------------777777077777 c 
Easy to administer and score--.-.----------------- — 
Best available for purpose------------------------ — 

3. Administration and scoring 
Administrators well trained_---------------------- —€— 
All necessary information provided----------------- —— 
Scorers adequately instructed 
Scoring carefully checked...--------------------7- —— 

4. Physical conditions 
Sufficient space------ 
Sufficient tese eue nn ag eee amass — 
Conveniently scheduled....------------------77-777 

5. Utilization of test results 
Definite plans for use of results-.------------------ — 
Provision for giving teachers all necessary help in using 


EE T tcm mp cis E E ium iim HEBES 

6. System of records 
Necessary for purpose----- --- 
Sufficient for purpose... -- - 
Convenient. form for is. m--oeeoo TE 


Roger T. Lennon, “Planning a Testing Program,” Test Service 
Bulletin, no. 55, Division of Test Research and Service, Yonkers, 


N.Y.: World Book Company, P. 3. 
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7. Personnel 
Adequately trained for the purpose 
8. Affiliated research 


Provision for special studies, analyses, etc. 


It is likely that it will be possible to check more of the 
items in this list if planning for the program has been a co- 
operative affair. This group approach should include staff 
members' sitting in the meetings having to do with test selec- 
tion, defining Purposes, constructing the cumulative record, 
and planning the other details. This approach will do more 
than strengthen the testing program. It can be a means of 
welding the faculty into a Stronger corps and a means of pro- 
moting individual teacher development. 


Informal Evaluative Techniques 


We have seen that formal and standardized tests constitute 


a substantial part of the Program of evaluation but not the 
entire program. There are also inf 


ormal techniques of evalua- 
tion, such as anecdotal records, rating Scales, and observation, 
which provide valuable, 


though not statistically accurate, data. 
Creative writing, drawing, and Painting are also useful in 
evaluation of pupil behavior, 


Anecdotal records are valuable Supplements to the evalua- 


n program, but like standardized tests, they require the ex- 


ercise of skill in use and caution in interpretation.* (1) The 
anecdotal record should be a s 


“Helen Bieker in Fostering Mental Hea 
tion for Supervision an 


National Educational 


lth in Our Schools, Associa- 
d Curriculum Development, Washington, D.C.: 
Association, 1950, pp. 184-202, 


————— UH 
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avoided—for example, “Tommy sharpened his pencil five 
times today between 2:15 and 3:10, each time poking or 
brushing some other pupil on his way to the sharpener" rather 
than “Tommy’s resistance to order and routine is revealed in 
his chronic tendency to irritate others." Teachers should 
avoid making immediate interpretations of behavior, since the 
value of the anecdotal record is in tying apparently discrete 
bits of behavior together into a pattern and affording perspec- 
tive on the child's growth over a period of time. Of course, the 
record may also be used to study a particular child who is 
experiencing difficulty in adjustment; then the account may 
describe some particular behavior or situation which appears 
to be characteristic. 

In the evaluation of the social effectiveness of a pupil, 
rating scales may prove to be significant. Since it is important 
to know what others think of an individual in order to point 
the way to personal and social improvement, the rating scale 
will provide clues to approaches. The more effective rating 
scales will deal with specific situations rather than intangible 
personality traits. The individuals doing the rating must know 
one another rather intimately. Since raters differ in the severity 
or leniency of their judgments, the final results are tentative 
in nature. Since relationships change with length of acquaint- 
ance, the results have temporary value only. Inasmuch as the 
sociometric design is closely related to the rating scale, these 
Same precautions and reservations should be heeded in using 
Sociograms. 

Informal observations may prove to be a valuable supple- 
ment to the evaluation program—if the teacher can imagine 
himself in the role of a psychologist rather than that of a 
Policeman. By doing more listening and less talking and by 
avoiding the show of shock, the teacher may obtain valuable 
clues to pupil behavior. Too frequently, though, teachers are 
full of good advice and the tendency to chide: “Tsk, tsk. You 
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don’t really mean that.” Taking time for observation in the 
classroom, on the playground, in the gym, and in all school 
activities will result in insights that will go much further to- 
ward changing undesirable pupil responses than the show of 
disapproval or the autocratic blocking of wayward conduct. 


Recording and Reporting 


The testing program will lose a substantial part of its value 
unless careful records are kept. Too frequently tests lose much 
of their value because they are used to find status rather than 
to indicate pupil development and progress. It would be de- 
sirable to have a somewhat uniform cumulative record used 
in different schools so that when pupils transferred the data 
that accompanied them would be readily understood. No 
less desirable in the cumulative record is brevity. It should be 
short in order to avoid overwhelming the teacher with facts and 
figures and to prevent the teachers’ spending hours on the 
clerical detail of recording. Spaces should be provided on the 
card or folder for personal data (name, sex, birth date, etc.), 
address, chronology of schools attended, achievement-test 
data, intelligence-test data and special test data (diagnostic, 
aptitude, etc.). In addition the folder may contain a few care- 
fully selected conference notes, reports of observations, and 
anecdotal records. It should be kept in mind that the purpose 
of the cumulative record is to facilitate the adjustment of the 
child in his next School or to his next teacher. 

The maximum benefit of a well-planned and well-executed 
Program of evaluation cannot be realized if practices of re- 
porting to parents remain on the traditional percentage basis, 
the A, B, CD, F categorization, or even the C, S, N innova- 
tion. Letters home and home visits have proven helpful. But 
the more promising practice is teacher-pupil-parent confer- 
ences at the elementary level and teacher-pupil conferences 
at the high school level. An increasingly large number of 
schools are using conferences in place of report cards and are 
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generally satisfied with the results. Two cities diagonally 
across the United States from one another may perhaps be 
considered representative—Arlington, Virginia and Van- 
couver, Washington. Teachers and parents in these two cities 
answer in the following ways some of the questions that are 
most frequently asked in connection with this departure from 
the conventional report card: Yes, it takes time, but teachers 
find that their additional insights pay dividends in helping 
pupils. Yes, pupils work even harder when the threat of grades 
is removed. No, parents are not 100 per cent for the plan, 
but in Arlington, 92 per cent of them are. Yes, it takes a con- 
tinuous parent-education program. Vancouver teachers report 
that parents must be reeducated each year. No, pupils do not 
lose in achievement. Pupils in both cities are, on the average, 
at or above the national norms for age-grade status. Yes, it is 
definitely worth trying, for pupils, teachers, and parents re- 
port increased understanding of one another and the result 


is better rapport. 
SUMMARY 


An effective testing program must fit the specific local 
needs. The minimal testing program suggested in this chap- 
ter must be considered only as a point of departure—a guide 
to planning. 

Actually, minimum programs recommended by various 
scholars may vary from one test to as many as ten. If one test 
is used, it should be a mental-ability test; but in the primary 
grades, the reading-readiness test is a more accurate indicator 
of the ability required for school. Next in importance is either 
the general-achievement test or the diagnostic reading test— 
depending upon whether the pupils are in the upper grades or 

* Raymond H. Rignall, *Are Report Cards Necessary?" Family Cir- 


cle, 41 (3):104-111, September, 1952. 
*Paul F. Gaiser, A Guide to a Functional Program of Reporting 


Pupil Progress to Parents, Vancouver, Wash., 1950, 56 pp. (mimeo- 
graphed). 
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in the primary grades. Of perhaps equal importance with 
achievement tests are Silent-reading tests, because reading 
skills so strongly condition the pupils’ attitudes toward self and 
School. The evaluation of personality is often omitted from 
minimal programs. However, inasmuch as sociometric tech- 
niques and some projective techniques are inexpensive and 
informative and personality development is such an integral 
responsibility of the school, the authors recommend this ap- 
proach to personal and Social evaluation. The inclusion of 
Subject-matter tests, which closely resemble diagnostic tests, 
would place the program which included them on the border- 


line between a strong minimal program and one that ap- 
proached the ideal, 


Planning an effective testing program is not easy, but 
neither is effective teaching a simple process. Just as good 
teaching is made up of many Separate steps, so the effective 
use of tests involves attention to many small details. The re- 


hat another Step will be taken toward 
developing the robust Pupil who, when his school days are 
Over, can steer his Own course, 


STUDY AND DISCUSSION EXERCISES 
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4. Would you consider it more important to use equivalent 
forms of intelligence and achievement tests, or to give single tests 
in these areas and add others, such as mechanical- and musical- 
aptitude tests? 

5. Draw up a tentative schedule for test administration for the 
entire year, giving the days of the week and the dates. Submit 
it to your colleagues for suggestions and improvement. 

6. How would you suggest that the testing program in a ten- 
teacher eight-grade elementary school be launched? Give details. 

7. Get the help of some of your colleagues in drawing up a 
cumulative record which would be adequate for what you regard 
as a good testing program. 


SUGGESTED ADDITIONAL READINGS 


Cole, Lawrence E., and William F. Bruce: Educational Psychol- 
ogy, Yonkers, N.Y.: World Book Company, 1950, pp. 625-671. 
This survey of the origin, development, and use of tests provides 
a good background for wise selection of tests. The authors stress 
the need for keeping accurate records and for interpreting results. 
Jordon, A. M.: Measurement in Education, New York: McGraw- 
Hill Book Company, Inc., 1953, pp. 67-94. 
This chapter deals mainly with achievement testing, but the 
Suggestions are detailed. Illustrative material is included. 
Knapp, Robert H.: Practical Guidance Methods, New York: Mc- 
Graw-Hill Book Company, Inc., 1953, pp. 1—54. 
The author lists, from the pupil-guidance point of view, sug- 
Bested tests in such areas as those mentioned in our chapter. He 
Provides illustrations of cumulative records. 
ursell, James L.: Psychology for Modern Education, New York: 
W. W. Norton & Company, Inc., 1952, pp. 391—469. 
Against a background of theory relating to intelligence and spe- 
cial abilities, the author describes and evaluates in these two 
chapters a number of intelligence and ability tests. The mate- 
rial provides a good basis for planning a testing program. 
Tiegs, Ernest W.: Educational Diagnosis, Educational Bulletin no. 
18, Los Angeles: California Test Bureau, 1948, 16 pp. (free). 
This pamphlet is a description of the use of diagnostic tests in 
improving instruction. There are many sound and practical sug- 
Bestions, although the endorsement of personality tests seems 


Somewhat too hearty. 


APPENDIX 


Organizing Test Results for 
Interpretation 


Developing effective ways to gather information about pupils 
is an essential teacher activity. An equally important activity 
is the task of organizing the data to permit analysis, compari- 
sons, and interpretations, 

The purpose of this appendix is to present some of ipe 
techniques which may assist the teacher to organize data in 
à meaningful way. Methods of ordering and recording scores 
and developing central reference points and certain relative 
measures are described, and an annotated list of references is 


presented to assist the teacher who is interested in developing 
an understanding of other 


» more rigorous statistical pro- 
cedures. The content of thi 


S appendix has been selected on 
the basis of simplicity and Possibility of use by classroom 
teachers rather than on the basis of a criterion of essential 
mathematical precision or adequacy. 


Ranking Scores 


Mr. Brown has just scored à science examination for his 
eighth-grade Pupils. The test contained thirty-five items, 
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each valued at one score point. If Mr. Brown copied the 
scores from the answer sheets without a plan of organization, 
the results might appear like this: 


94. 20, 26, 16, 23, 25, 30, 24, 
19, 21, 24, 28, 20, 23, 32, 26, 
21, 25, 21, 18, 32, 24; 23, 15, 
25, 22, 29, 26, 23, 26. (N= 30) 


Organized in rank order from high to low, the list of scores 
1s more meaningful: 


32, 32, 30, 29, 28, 26, 26, 26, 
26, 25, 25, 25, 24, 24, 24, 24, 
28. 253. 925, 23, 922, 21, 21, 21, 
20, 20, 19, 18, 16, 15. (N=30) 


The Tally Sheet 


Another means of giving meaningful organization to a set 
Of scores is a tally sheet, or frequency table, such as that pre- 
sented in Figure 20. Scores in this table are presented in units. 
In some instances the teacher might wish to group the scores 
in intervals of two, three, or five to provide a convenient sum- 
mary table. Steps in preparing the tally sheet are presented 
with Figure 20. 

The tally sheet has the following values for the teacher: 


1. It presents test results organized in terms of size of 
score. 

2. It represents a summary of the test results in a form 
easily scanned for information. 

3. It presents scores in a form which permits checks and 
additional calculations if they are desired. 

4. Properly documented as to date, type of test, grade, 
and teacher, the tally sheet constitutes a permanent 
record of the results of the test. The teacher may wish 


Score Tally Frequency f (score) 
35 
34 
33 - 
32 // 2 
31 
30 / 1 - 
29 / m 
28 / 
27 
26 Il 4 pe 
23 Il 3 p 
24 ll 4 35 
23 Ill 4 = 
22 i 1 E 
21 /// 3 " 
20 // 2) a 
19 / 1 n 
18 / 1 1 
17 
16 / 1 16 
s / 1 15 
N 30 30 711 


FiG. 20. Tally Sheet, or fre 
eighth-grade pupils. 


Tally: 


1. List units for range of scores from highest to lowest (column 1). 


2. Tally actual scores from Pupil answer sheets; check tallies with 
number of test papers (column 2). 


3. Sum tallies and record for each score (column 3). 
Arithmetic mean (average): 


1. In column 4 each Score ha: 
of that score (G8 32% 2 

2. At foot of column 4 is t 
column 4). 

3. Divide the sum of all the scor 
at the foot of column 2 (eg; 711 + 30 — 23.7); 


4. The result of this calculation (23.7) is the arithmetic mean or 
average. 


quency table, of science-test scores of 30 


s been multiplied by the frequency 
64). f 
he sum of all the scores (the sum 0 


es by the number of scores or N 
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to use this record (a) in comparing two or more classes, 
(b) as an aid to the assignment of grades, (c) in com- 
paring individuals with the group, and (d) in consider- 
ing the status of individuals relative to the average or 
typical performance of the group. 


REPRESENTATIVE MEASURES 


The teacher may wish to establish a single score which 
best represents the performance of a group. For example, 
when asked, “How did your group perform on the test?” Mr. 
Brown might answer, “The average score was 23.7.” In effect 
Mr. Brown has attempted to represent an entire set of scores 
by means of one quantity. Such measures are termed measures 
of central tendency. Three such measures, the mid-measure, 
the arithmetic mean (average), and the median are presented 
here. 

The Arithmetic Mean. The arithmetic mean, or average, is 
a statistic with which most of us are familiar. It is one way of 
encompassing a variety of data in one quantitative statement. 
Thus we may describe a pupil as of average height or 
weight, of average intelligence, as an average fourth grader. 
Instead of the term average we might use typical or represen- 
tative. 

Aside from its value as a representative measure, the arith- 
metic mean can be utilized as a point of reference. For ex- 
ample, when we say that Judy is above or below average for 
her age or grade in any specified trait, we are using the aver- 
age as a reference point. In educational evaluations, the refer- 
ence point is seldom if ever an absolute quantity, like size 
of score; rather it is a point, usually somewhere in the center 
of a distribution of scores. The arithmetic mean is an example 
of this type of reference point to which a series of test scores 


can be related. 
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The arithmetic mean is calculated by summing all the 
scores and dividing by the number of pupils for whom scores 
have been recorded. Calculations may be developed from a 
tally sheet or frequency table in the manner outlined in Fig- 
ure 20. 

Some values and uses of the arithmetic mean are the fol- 
lowing: 


1. It is a relatively stable and accurate representative meas- 
ure. 

2. It may be used as a point of reference with which the 
performance of individual pupils within the group may 
be compared. 

3. When the same test is used with two or more groups, 
the mean may form a basis for comparison of the 
groups. 

4. The mean forms the basis for the calculation of other 


measures, such as the standard deviation and standard 
scores, 


The Mid-measure. A second type of central reference point 
or expression of central tendency is the mid-measure. The 
mid-measure is the middle score of a series of scores. When 
the number of scores is even, the mid-measure is the average 
of the two scores nearest the middle of the distribution of 
Scores. This measure is likely to be useful when the teacher 
needs only a quick and Very approximate indication of cen- 
tral tendency. In the case of our distribution of science-test 
Scores (Figure 20), the mid-measure (the average of the 
fifteenth and sixteenth scores) is 24, 

The Median. The median is a point on a scale of scores 
which divides the distribution into two equal parts. That is. 
one half of the scores fall above the median and one half be- 
low this point. The median is computed from a frequency 
table or tally sheet such as that presented in Figure 21. For 
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Test Continuous Cumulative 

Scores scale Frequency frequency 
82 31.5-32.5 2 30 
31 30.5-31.5 
30 29.5-30.5 1 28 
29 28.5-29.5 1 27 
28 27.5-28.5 1 26 
27 26.5-27.5 
26 25.5-26.5 4 25 
25 24.5-25.5 3 21 

*24 [23.5-24.5] 4 18 
23 22.5-23.5 4 [14 | 
22 21.5-22,5 1 10 
21 20.5-21.5 3 9 
20 19.5-20.5 2 6 
19 18.5-19.5 1 4 
18 17.5-18.5 1 3 
17 16.5-17.5 
16 15.5-16.5 1 2 
15 14.5-15.5 1 1 

N—30 


Fic. 21. Calculation of the median. 


1. Find half the number of scores (N/2 = 30/2 = 15). 

2. From column 4, find the cumulative frequency equal to or less 
than N/2 (i.e, 14). The median will lie in the score interval imme- 
diately above this. 

3. Divide the size of the score interval by the number of scores in 
the interval which contains the middle score (ie, 124-4 = .25). 

4. Multiply this correction (.25) by the number of scores needed 
to reach the midpoint of the distribution, in this case 15 (midpoint) 
less 14 (cumulated below the interval which contains the fifteenth 
Score) (15 — 14 = 1 and 1 x .25 = 25). 

5. Add the result of calculations in step 
(23.5) of the interval in which lies the midpoin 
The result is the median. The median is 23.5 + .2 


4 (.25) to the lower limit 
t of the distribution. 
523915 
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purposes of computation the scores are regarded as a contin- 
uous series. Each unit score—for example, a score of 25— 
is considered to represent a range of achievement from 24.5 
to 25.5, much as an inch on a foot rule may be regarded as 
a distance on a continuous linear measure rather than as a 
point. 

The method of computing the median is presented with 
Figure 21. This measure serves many of the same purposes 
as the arithmetic mean; it is a central reference point which 
may facilitate comparisons of individuals and groups. It is 
not ordinarily so stable as the mean, but it does have some 
advantages when the teacher wishes to avoid giving emphasis 
to extreme scores. 

The median is essentially a counting or ranking measure 
which emphasizes relative position rather than actual size of 

` score. For example, in the following series the arithmetic 

mean is affected markedly by alteration of one extreme score, 
whereas the median is unaffected by the change. 


Series 


A 90, 40, 38, 32, 30 

B 45, 40, 38, 32, 30 
Mean series A = 220 — 5 — 44 
Mean series B = 185 + 5 — 37 
Median series A and B = 38 


STUDYING THE DISTRIBUTION OF SCORES 


Although measures of central tendency such as the mean 
and median are useful as reference points for comparisons and 
interpretations, a single point in a distribution fails to tell 
the whole story. An important consideration may be the 
extent to which scores are distributed over the range of the 
test. For example, in Figure 22 the distributions of scores for 


ORGANIZING TEST RESULTS 269 


DISTRIBUTION OF SCORES 


Frequency, Frequency, 
Score class A | class B 

32 2 

31 

30 1 

29 1 1 

28 2 

27 3 

26 4 3 

25 3 4 

24 4 4 

23 4 2 

22 1 3 

21 3 4 

20 2 3 

19 1 1 

18 1 

17 

16 1 

15 1 

N 30 30 
Sum of scores 711 715 
Median 23.75 24.0 
Mean 23.7 23.8 
Range 17 10 
Q; 25.8 26.0 
Q, 21.0 21.4 
Interquartile range 4.8 4.6 
Q— Qs z Qı 24 2. 


Fic. 22. Distribution of scores for class A and class B on an eighth- 


&rade test in social studies. 
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class A and class B indicate that the two groups are not alike 
in performance on the test although the means and medians 
of the two groups are quite similar. The range of scores for 
class A is greater than that for class B. 

The teacher may study the dispersion of scores by means 
of a tally sheet, and he may wish to record the range, which 
is the difference between the highest and lowest scores. In the 
case of class A (Figure 22) the range is 32 — 15, or 17. The 
range for class B is 29 — 19, or 10. 

The range is a relatively unreliable measure, readily influ- 
enced by changes in individual scores at the extremes of the 
distribution. However, this measure provides (a) a simple 
method of describing the dispersion of a set of scores and (5) 


additional information beyond that represented by measures 
of central tendency. 


Quartiles, Deciles, and Percentiles 


A number of measures may be used to indicate various 
points in the distribution. Among such measures are quartiles, 
deciles, and percentiles, which divide the distribution into 
quarters, tenths, and hundredths. The first quartile (Q:) is 
a point which sets off the lowest 25 per cent of the scores. 
The third quartile (Q;) is a point below which fall 75 per 
cent of the scores. Quartile two (Q2) is identical with the 
median in location and definition. 

Deciles are points below which fall the indicated tenths of 
the scores (e.g., decile 7 marks off the lower seven-tenths of 
the distribution). 

Percentiles mark off the indicated per cent of the distribu- 
tion; for example, percentile 75 (P;,) is the point below 
which fall 75 per cent of the cases. 

The calculation of all these measures is based on the as- 
sumption of a continuous distribution, and all are calculated 
in essentially the same manner as the median when the ap- 


ORGANIZING TEST RESULTS 271 


Percentile Decile Quartile Median 
95 
90 9 
85 
80 8 
75 3 
70 7 
65 
60 6 
55 
50 5 2 Median 
45 
40 4 
35 
30 3 
25 1 
20 ü 
15 
10 1 
5 


Fic. 23. Relationships between percentiles, deciles, quartiles, and the 
median. 


propriate common or decimal fraction is substituted in the 
formula. Figure 23 illustrates the relation between these meas- 
ures, and Figure 24 illustrates the method of calculation of 
percentiles. General procedures for the calculation of quar- 


tiles, deciles, and percentiles are as follows: 


1. Multiply the total number of scores by the desired frac- 
tion (quartile, decile, or percentile). 

2. Find the nearest cumulative total of frequencies equal 
to or less than the fraction of the distribution repre- 
sented by the selected quartile, decile, or percentile 


point. 


T APPENDIX 
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Test Continuous Cumulative 

Scores scale Frequency frequency 
32 31.5-32.5 2 30 
31 30.5-31.5 
30 29.5—30.5 1 28 
29 28.5-29.5 1 27 
28 27.5-28.5 1 26 
27 26.5—27.5 " 
26 [25.5-26.5] [4] 2 
25 24.5—25.5 3 
24 23.5-24.5 4 18 
23 22.5-23.5 4 14 
22 21.5-22.5 1 10 
21 20.5-21.5 3 9 
20 19.5-20.5 2 6 
19 18.5-19.5 1 4 
18 17.5-18.5 1 3 
17 16.5-17.5 
16 15.5-16.5 1 2 
15 14.5-15.5 1 1 

N 30 


Fic. 24. Calculation of percentiles: How to find percentile 75. 


Find 75 per cent of 30, or 22.5. Locate in column 4 the cumula- 
tive frequency less than 22.5. This is 21. 


Percentile 75 is located in the next interval above cumulative fre- 
quency 21 or in interval 25.5-26.5, ad 
: Find the difference between the computed Px point (22.5) a 


the nearest cumulative total less than this (22.5 — 21 — ied ae 
- Since the size of the score interval is one unit and there are fo 
Scores in the interval in wh 


ich P; is located, each score has a value 
of 1 + 4, or .25. 


5. Multiply 1.5 x .25 — 37 
the lower limit of the inte 
6. Add the correction (.37. 
interval which contains 
When rounded, 25.9.) Pe 


5. This is the correction which, added to 
rval, brings us to percentile 75. . à 
5) to the value of the lower limit of t r; 
percentile 75. (25.5 + .375 = 25.875, 0^ 
rcentile 75 is 25.9. 
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3. Calculate the fractional “distance” into the next higher 
continuous score interval necessary to reach the desired 
point. 

4. Multiply the fraction (step 3) by the size of the score 


interval. 
5. Add this quantity to the lower limit of the score inter- 


val in which the desired quartile, decile, or percentile 
point is located. The result is the desired point in the 
distribution. 


Measures such as quartiles, deciles, and percentiles indi- 
cate how an individual stands in relation to the group from 
which the measures were derived. Such information is likely 
to be more valuable than actual test scores (raw scores) for 
purposes of evaluation and for permanent records. 


Interquartile and Semi-interquartile Ranges 


The quartile points form the basis for a statistic which can 
be used to find the dispersion of scores around the median. 
The range is markedly affected by changes in scores at either 
extreme of the distribution. A more stable estimate of disper- 
Sion is provided by the interquartile range, or the difference 
between Qs and Q.. This is the range of scores which includes 
approximately the middle 50 per cent of cases. For the score 
distributions of class A and class B in Figure 22, the values 
of Q, and Q; for each class have been indicated. The inter- 
quartile ranges have been calculated by the formula 


Q: — Qı = interquartile range 


It will be noted from Figure 22 that, although the range 
seems to indicate a marked difference in dispersion in the case 
of these two groups of scores, the interquartile ranges of 4.8 
and 4.6 indicate that over the central area of the distributions 


the spread of scores is quite similar. 
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The semi-interquartile range (Q) is frequently used to de- 
scribe the variability of scores around the median. Q is one- 
half the interquartile range. In the case of a set of scores 
which are distributed symmetrically around the median, the 
range of the middle 50 per cent of scores lies between the 
median plus Q and the median minus Q. 


EXAMINING RELATIONSHIPS 


For some purposes the teacher may wish to study the re- 
lationships between two or more sets of scores. For example, 
measures of ability and achievement are frequently compared. 

To study the relationship between two sets of scores Or 
characteristics which may be evaluated along a scale, the 
teacher may use a scattergram such as that illustrated in Fig- 
ure 25. A scattergram such as that illustrated may help the 
teacher to locate pupils who (1) do not appear to be achiev- 
ing at the level which might be expected of them, (2) are 
achieving at a level higher than might be expected, and (3) 
are working at a level of reasonable expectancy, although 
their achievement is low. The scattergram merely organizes 
the data so that relationships such as those above are more 
evident. Like the other techniques presented in this appendix, 
it is a method of organizing data to clarify certain character- 
istics and relationships of score distributions. These tech- 
niques do not tell the teacher how to evaluate these observa- 
tions. For instance, from the scattergram illustrated in Figure 
25, it appears that C. F. and D. R., although at the extremes 
of the class in both ability and achievement, are placed about 
where we might expect to find them. E. K., on the other hand, 
is commonly termed an underachiever, since relative ability is 


considerably in excess of relative achievement. N. T. might be 
termed an .overachiever 


compared to the group, 


i 


» Since, with low-average ability as 
he is achieving a relatively high level. 
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Achievement 


Ability 


Q Md QOO, 

Fic. 25. A scattergram designed to indicate the relationship between 
ability and achievement for selected pupils. In this diagram, quartiles 
have been used to indicate relative standing within the group for each 
test. Results have been indicated by placing the pupil's initials in the 
appropriate cell as follows: (a) C. F. ranks in the highest quarter in 
ability (row 4) and also in achievement (column 4). Hence his 
initials appear in the cell representing the intersection of row 4 and 
column 4. (5) D. R. ranks in the lowest quarter of the group in each 
of the areas. Hence his initials appear in the cell representing the in- 
tersection of row 1 and column 1. (c) E. K. ranks in the third quarter 
in ability (row 3) and the first quarter in achievement (column 1). 
(d) N. T. ranks in the second quarter in ability (row 2) but in the 
fourth quarter in achievement (column 4). 


Both E. K. and N. T. might be worth study on the part of 
the teacher to try to identify possible reasons for the dis- 
crepancy between achievement and ability. 


SUGGESTED ADDITIONAL READINGS 


Froelich, C. P., and J. G. Darley: Studying Students, Chicago: 


Science Research Associates, Inc., 1952. 
Chapters 2 and 3 present an overview of methods of summariz- 


ing and analyzing test scores. 
Remmers, H. H., and N. L. Gage: Educational Measurement and 


Evaluation, rev. ed., New York: Harper & Brothers, 1955. 
Chapter 21 of this comprehensive text describes statistics related 


to the interpretation of test scores. 
Ross, C. C., and J. C. Stanley: Measurement in Today's Schools, 


3d ed., Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1954. 
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Chapter 3 presents an account of statistical measures as an aid 
to the analysis of test results. 
Thorndike, R. L., and E. Hagen: Measurement and Evaluation 
in Psychology and Education, New York: John Wiley & Sons, 
Inc., 1955. 
Chapter 5 introduces statistical concepts related to the study 
and interpretation of test scores and distributions. 
Wrightstone, J. W., J. Justman, and I. Robbins: Evaluation in 
Modern Education, New York: American Book Company, 1956. 


An appendix, pp. 447—457, presents a concise discussion of 
fundamental statistical concepts. 


Glossary 


ability. Power to perform a specified act. Capacity for accomplish- 


ment as opposed to potential. 
absolute measure. Units of measurement defined and interpreted 


in terms of a fixed standard or basis, e.g. units of linear 
measurement such as inches and feet. 

achievement test. A test designed to measure the extent to which 
an individual has acquired certain knowledges or skills as 
a result of a program of instruction. 

adequacy. Inclusion in a test of sufficient samples of behavior or 
performance to constitute a good indication of the total. 

age norm. Average score obtained by pupils of a given age. The 
typical score or value representative of a certain age. (See 
grade norm.) 

ambiguity, The capacity of a test ite 
than one way; such items are Su 
and thus undesirable as test items. 

anecdotal record. A series of brief, written de 
behaviors of a pupil. 

appraisal. An evaluation based on data from many sources or con- 
sidering multiple facets of personality and achievement. 

aptitude. The potential or combination of potentials indicative of 
one's probable capacity to learn in some particular area. 
(One may have an aptitude for music without possessing 


the ability to perform musically.) 
attitude test. A set of questions or hypo 
to determine one's mental predispositions 
277 


m to be interpreted in more 
bject to various meanings 


scriptions of typical 


thetical situations designed 
or leanings. A de- 
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vice for estimating how one will act or believe or what beliefs 
and actions one has readiness for. 
average. See mean. 


capacity. Potentiality for the development of a skill or knowledge. 
(One may have the capacity, or the potentiality for devel- 
oping the ability, to play the piano.) 

character test. A device used to evaluate that aspect of personality 
which relates to ethical, moral, and religious situations and 
concerns the right and wrong of conduct. Measures the inner, 
consistent trends of behavior. 

check list. A device for gathering data by means of a list of pre- 
determined items. The respondent has only to mark the 
items which are pertinent in his response. 

coefficient of correlation (or validity, or reliability). A numerical 
expression of the extent of agreement between two measures 
or measuring instruments. It is expressed in decimal fractions 
ranging from a plus 1.00 (perfect positive agreement) 
through 0.0 (no relationship one way or another) to à 


minus 1.00 (perfect negative relationship—the more of one, 
the less of the other). 


comparability. See equivalent test. The quality of tests that makes 
it possible to use them as substitutes for one another. Having 
the same number of items of the same degree of difficulty 
and covering the same scope or range of material. 

completion test. A test made up of items consisting of a sentence 
or statement from which a word or words have been omitted. 
The student is expected to supply the missing word or words 
in giving his answer. 

constant alternatives. A device used in rating techniques to pro- 
vide a constant set of rating steps (e.g., excellent, good, fair, 


poor) which apply to a number of characteristics being 
rated. 


criterion. A model, point, or standa: 
vides the basis for judgin 
situation. 


rd for comparison which pro- 
g the merit of a test, behavior, OT 
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cumulative record. A card or folder, usually printed, which pro- 
vides blanks or spaces for the periodic entry of significant 
data about the child's development. Birth date, family data, 
address, school record, results of ability and achievement 
tests, personality schedules, and anecdotal records are among 


the data commonly recorded. 


deciles. Points in a distribution of scores which divide the distri- 
bution into ten equal parts in terms of frequency of scores. 

derived score. A score which has been converted from the raw 
Score, e.g. age scores, grade scores, standard scores, per- 
centile scores. Derived scores give meaning to a given raw 
Score. 

descriptive rating scale. 
presents verbal descriptions an 
of possession of the trait being measured. 

diagnosis. The interpreting of data in such a way as to determine 
what the specific causes of a pupil's difficulty are. Also, the 
verbalized statement of the interpretation of data. 

diagnostic test. A test that indicates areas of specific difficulty, 
usually in skill subjects such as reading, spelling, or arith- 
metic. The actual diagnosis of the difficulty is made by the 
teacher or clinician who interprets the test results. 


A rating scale providing a continuum that 
d definitions of the degrees 


c of a test relating to its cost, figured 
on the basis of how much the test does to facilitate instruc- 
tion and how many pupils are serviced. Economy also relates 
to the saving of the teacher's time in test construction, scor- 
ing, and evaluation of results. 

educational age. A derived score expressed in terms of the aver- 
age score earned by pupils at a given age on batteries of sub- 
ject-matter tests or on tests of specific subject areas. Often 
expressed as a grade equivalent. 

equivalent score. An alternative way of expressing a score sO as 
to give additional insight into its meaning. Thus a percentile 
score may be translated into “equivalent” grade-placement 


or mental-age scores. 


economy, The characteristi 
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equivalent test. A test designed to sample exactly the same area 
of behavior as another test, so that the scores do not vary 
significantly when the two tests are given under identical 
circumstances. An equivalent test should have the Same 
number of items, sample the same areas, and contain items 
of equal degrees of difficulty. . 

essay examination. A series of questions which the pupil is to 
answer by writing compositions. The answer to the question 
is "discussed" in writing by the pupil. + 

evaluation. The process of determining the worth of a given indi- 
vidual’s Personality, performance, or merit. Usually depends 


upon data from many sources and of many varieties. 
examination. See test. 


fixed pattern. A relatively consistent rating tendency on the part 
of the rater, e.g., consistently rating most pupils high, low, 


or average, thus failing to disperse ratings over the length of 
the scale. 


frequency distribution, 
ranged in serial or 
number of scores 


A tabulation of scores (tally sheet) ar- 
der (e.g., from high to low) showing the 
falling at each point in the distribution. 


grade equivalent, See grade norm, 


grade norm, A derived score expressed in years and months of 
location in the elementary and high school. A grade score 
of 8.4 means that the Subject’s score is about the same as 
that of the average child who has been in the eighth grade for 
four months, The grade norm may be based on either the 


median or the average of the distribution of the scores for the 
grade level. 


grades. Numerical or al 
given area of his school e 
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group test. A pencil-and-paper test in which several subjects are 
tested simultaneously by one examiner. Most classroom tests 


are of the group variety. 


“halo” effect. The result of influence of a rater's general impres- 
sions of the subject on his evaluation of the individual with 


respect to some particular quality or performance. 


individual test. A test, either verbal or nonverbal, in which one 
examiner tests one subject at a time. The Stanford-Binet and 
Wechsler-Bellevue are examples of individual tests. 

intelligence test. An evaluative device designed to express quan- 
titatively the relative status of a subject with respect to mental 
maturity or level of mental functioning. May be designed 
to estimate general intellectual ability or to assess specified 
intellectual or mental factors or characteristics. 

interest inventory. A means of measuring extent of attraction to 
certain specified types of activity. May be called interest or 
preference tests or questionnaires and may be designed to 
assess vocational, educational, social, or personal preference. 

inventory. A term frequently used to replace the term test in meas- 
urement areas such as interest and personality. 


1.Q. Abbreviation for intelligence quotient, an indication of pres- 
ent rate of mental growth determined by dividing mental 


age by chronological age and multiplying by 100. An index of 
relative brightness based on scores on a test of intelligence 
or mental ability. Typically derived as the ratio between 
mental and chronological ages. 

isolate. A person who is not chosen by any of the members of his 


group on a sociometric test. 
item analysis. A study of each question on a test for the purpose 


of comparing the number of subjects (out of the total num- 
ber taking the test) who missed the item and the number 
who answered it correctly. Each question is then studied to 
detect ambiguity, validity, reliability, difficulty, and dis- 


criminating power. 
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P gu 
marks. Letters or numerical symbols representing attempts to n 
duce the complexities of school achievement to a single in 
dex. See grades. 
g " " E 
matching test. An examination consisting of two lists of sme 
or phrases in columns in which the task of the subject 1S eo 
pair off each item in one of the lists with a related item in 
the other list. — 
mean (arithmetic mean). The average obtained by dividing 
sum of a group of scores by the number of individual scores 
in the group. - 
measurement. 'The application of a precise, quantitative unit 
value to any property, quality, or outcome. : 
median. The midpoint of a set of Scores arranged in order E 
high to low. The point that divides the distribution of scor 


into two equal parts so that half the scores fall above and 
half below the median. 


nonverbal test. A test which does not require the individual to 
write or to read. Examples of nonverbal test items are piling 
blocks in a prescribed pattern, stringing beads, and assem- 
bling puzzles. 

normal curve. A graphic representation of a distribution of scores 
or measures having a distinctively bell-shaped appearance. 
Scores are distributed Symmetrically about the mean with 
a concentration of scores around the central point and de- 


4 a e 
creasing frequencies toward the extremes, The normal curv 
has definite mathematical Properties, 

norms, Measures, based on test sco: 


res, which describe the perform- 
ance of a specified group. 


Norms may describe average : 
typical performance or indicate the status of the individua 


Or group with respect to the performance demonstrated by 
the specified group. 


objective test. An examination which can be scored with a mini- 
mum of influence from the Scorer's opinion or attitude as 
to whether it is right or Wrong. Short answers and an €s- 
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tablished answer key provide a routine procedure for scoring 
of the test items. 

objectivity. Absence or minimizing of the personal element in an- 
swering or in scoring a test or test item. 

organismic. Denotes the intimate and inseparable nature of all 
the many facets of growth and behavior in the functioning 
individual. For example, organismic age refers to the aver- 
age of chronological, mental, emotional, carpal, physical, 


physiological, etc., ages. 


pencil-and-paper tests. Tests which require the subject to write 
his responses. Used in contrast to performance tests in which 
the examiner must record the responses or behaviors. 

percentile. A. point in a distribution of scores below which a 
stated percentage of the cases falls. For example, percentile 
30 is a point in a distribution below which 30 per cent of the 
cases fall. 

percentile rank. The percentile at which a given score falls. For 
example, a percentile rank of 20 implies that 20 per cent 
of the subjects attained scores equal to or below the speci- 


fied score. 
performance test. See nonver 
is requested to do some motor act in a prescri 


test dependent on a work sample. 

personality test. A test or inventory designed to reveal those per- 
sonal characteristics of the individual which are considered 
to be related to his personality. These instruments also may 
be designated as adjustment, personality, or personal inven- 


tories. 
potentiality. An undevelo 
thought to be an inherited 
dependent upon environment 
practice effect. The influence of 
with a test upon current test result 
prognostic. Partaking of the nature o 
test is intended to predict behavior or 


ticular area. 


bal test. A test in which the subject 
bed manner. A 


ped aptitude or capacity. Sometimes 
characteristic but probably also 
al factors for its nourishment. 
practice or previous experience 
s. 

f prediction. A prognostic 
performance in a par- 
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projective technique. A method of studying personality or atti- 
tudes through reactions to pictures, meaningless forms, or 
material to be assembled. It is assumed that the individual 
"projects" his personality, interests, or attitudes in develop? 
ing an interpretation of the materials presented to him. 


Q. The semi-interquartile range—that is, one-half the tange of 
the middle 50 per cent of scores in a frequency distribution. 

quartiles. Points in a serially arranged distribution of scores which 
divide the distribution into four equal parts. 

questionnaire. A device designed to provide a rapid means of 
gathering data about an individual. Typically presents a list 
of statements or questions calling for a response. Frequently 
applied to personality and interest inventories. 


rapport. In the area of tests and testing, the development of 
favorable attitudes on the part of the subject toward the 
test materials and testing procedures. 

rating scale. A set of criterion answ 
Which values can be assigned t 
le, a handwriting Scale. See a 
descriptive rating scale, 

raw scores. The result obtai 


ers or models by means of 
© the samples being judged, 
lso graphic rating scale and 


&round experience which will enable him profit- 


Y of the area to which the readiness 
test applies, 


recall item. A test question which Tequires the student to remem- 


ber the word or answer which is most appropriate. Examples 
of the recall question are completion and essay questions. 


m that requires the subject to identify 


an answer. Examples are multiple-choice, true-false, and 


matching questions, 
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relative measures. Measures based on comparisons or relationships 
rather than on fixed or constant units. Test norms, for ex- 
ample, are relative measures in that their derivation and 
meaning are based on relationships to a series of scores rather 
than on any absolute or fixed value. 

reliability. The extent to which a test is consistent in measuring 
what it purports to measure. Usually indicated by a coeffi- 
cient of reliability or by an indication of the error of measure- 


ment. 


sampling. See wide sampling. 

scale. A calibrated instrument for measurement. Subdivisions of 
a trait or behavior are “laid off” at intervals along a con- 
tinuous line. As one progresses along the scale the items 
vary in nature, degree, or difficulty. 

scaled test. A test in which the items become progressively more 


difficult. 
schedule (personality). An inventory or list of personality char- 


acteristics. 

scoring formula. A method of systematically deriving a score from 
test data. Formulas may refer to weighting of items or cor- 
rections for guessing. 

sociogram. A graphic representation of the social preferences of 
a specific group. A mapping of personal preferences. 

sociometric test. An instrument designed to provide a basis for 
evaluating the interpersonal relationships prevailing among 


the members of a group. 
sociometry. The study of the attractions and rejections among var- 


ious members of a group. 

special aptitude. An indication of an individual’s ability to acquire 
a specified skill or knowledge. (See aptitude.) 

standard. A mark or goal to be achieved. Should be distinguished 
from a norm, which is the average or typical score rather 
than the desirable score. 

standard deviation. A measure of variability or dispersion of 
scores around the mean of the distribution. 

standardized test. A test which represents a sample of perform- 
ances taken under controlled conditions (of administration 
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and scoring) and providing a basis for interpretation in me 
form of norms or comparable information. A test for which 
norms have been established. ] 

standard score. Refers to a score based on the variability of a dis- 
tribution of scores around the mean of the distribution. The 
basic unit of such scales is the standard deviation. . 

stencil. As used in this book, not simply a mat for reproducing a 
test but a piece of stiff paper in which holes are punched to 
reveal correct responses on an answer sheet. 

subjectivity. As used in the area of evaluation or measurement, 
this term typically refers to the fact that the judgment of 


the person Scoring responses to test items may be a deciding 
factor in evaluation of the responses. 


test. An instrument designed to measure any quality, ability, skill, 

or knowledge. Usually a sampling, comprised of test items, 

of the area it is designed to measure. 

test manual. A pamphlet or booklet that accompanies most stand- 
ardized tests, Explains the purpose of the test, sometimes re- 
lates its historical development, cites statistical data obtained 
during standardization of the test, presents and interprets 


norms, cites limitations, and gives careful directions for ad- 
ministrating and Scoring. 


trait. One limited aspect of 


personality or character, e.g., honesty, 
sincerity, 


intelligence, determination, initiative, etc. . 
true-false test. A series of statements which the testee is to indi- 
cate are either correct or incorrect. Sometimes an alternative 


Is provided so that the item can be marked “doubtful” or 
“questionable.” 


validity. The characteristic o 
designed to sample. A 
skill in reading rather t 
in vocabulary, 


f a test of really sampling what it is 
valid test of reading really indicates 
han knowledge of a given area or skill 


verbal test. A test which Tequires the use of verbal or language 
skills. Many group intelli 


gence tests consist largely of verbal 
items. 
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derivation of, 45-47 
as derived Scores, 46 
establishment of, 50-53 
grade, 49, 52, 93-97 
derivation of, 93 
interpretation of, 94 
meaning of, 35, 45-47 
percentile, 95-97 
definition of, 54 
difficulties in using, 54—57 
function of, 95 
interpretation of, 96 
as ranking measures, 54-57 
as percentile ranks, 53—57 
as reference points, 45—47 
Standard scores as, 57-63, 97 
and standard deviation, 60— 
63 
value of, 63 
and standardization, 10 
and test manual, 63-65 
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Objective tests, economy of, 210 
meaning of, 206 
Objectives of evaluation, 244 
Objectivity, meaning of, 14 
Observation, of attitudes, 172 
for evaluation, 257 
of interests, 164 
Occupational Interest Inventory, 
168 


Parent-teacher conferences, 231 
Parents, letters to, 229 
Percentile, 270 
Percentile norms (see Norms) 
Percentile ranks, 53-57, 80 
definition of, 54 
as intelligence-test norms, 80 
as measures, 54 
and standard score, 62 
Performance test, 7 
Personal adjustment, evaluation 
of, 252 
Personality, definition of, 36, 120 
difficulties in appraising, 128 
evaluation of, informal, 133 
and intelligence tests, 121 
misconceptions of, 123 
pupil, evaluation of, 223 
and rating scales, 127 
tests of, 166 
traits of, 125 
types of, 124—126 
Personality inventories, 125, 130 
Physical examinations in basic 
test program, 246 
Pintner-Cunningham tests, 32 
Play analysis, 137 
Prediction, 18 m 
“Primary Pupil Progress Report, 
193 
Problem solving, 92 
Projective techniques, 9, 135 
use of, 253 


SUBJECT INDEX 


Psychological measurement, 4 

Publishers of tests, 34 

Pupil development, appraisal of, 
223 

Pupil diagnosis, 82 

Pupil differences, 203 

Pupil self-appraisal, 231 

Purposes of testing, 30 


Quartile, 270 
Questionnaires, 9 
of interests, 165 


Race and personality, 123 
Rating, nature of, 181 
Rating methods, 181—185 
improvement of, 188 
use of, problems in, 181 
in school, 197 
Rating scales, 37 
Behavior Observation Record, 
193 
check-list form, 193 
classroom uses, 198 
coded form, 193 
establishing scale steps, 192 
graphic type, 196 
Haggerty-Olson-Wickman Be- 
havior Rating Schedules, 
186 
organization of, 185—187, 193 
personality, 127 
"Primary Pupil Progress Re- 
port," 193, 195 
Springfield, Missouri, Senior 
High School, 191 
teacher development of, 198 
types of, 199 
uses in school, 197-199 
Rating techniques, 181 
Ratings, basis for judgment, 190 
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Ratings, definition of traits, 188— 
190 
fixed patterns, 182 
generosity error, 182 
halo effect, 183 
improvement of, 188—197 
reliability of, 184 
selection of traits, 188 
sources of error in, 182—185 
Raw score, 46 
Readiness, 107—109 
achievement related to, 88 
estimation of, for learning, 
107-118 
evaluation of, need for, 108 
and general skills, 114—116 
and maturation, 114—116 
and mental age, 108 
nature of, 107 
preparatory activities for, 114— 
116 
reading, 108, 245 
tests of (see Readiness tests) 
Readiness tests, 88, 109—113 
administration of, 111 
characteristics assessed by, 109 
general, 109 
interpretation of scores, 114 
Monroe Reading Aptitude, 111 
norms, 112 
and program planning, 114 
test items, illustrative, 110 
utilizing results of, 113-117 
Reading and MA, 75 
Reading age, 51, 97 
Reading readiness, 31 
in basic test program, 245 
Reading tests for upper grades, 
251 
Records, cumulative, 234 
Relative brightness, 77 
Reliability, 20 
Reporting practices, 258 
Rorschach blots, 136 
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Sampling, adequacy of, 23 
in test standardization, 47—49 
wide, need for, 47 
Scale of Social Distance, 175 
Scales, 8 ; 
(See also Attitude scales; Rat- 
ing scales) 
Scattergram, 274 
Score, derived, 45 
meaning of, 46 
raw, 46 
Scoring, techniques for, 215 
formulas, 207, 218 
stencils, 217 
Selection of tests, 28 
Self-appraisal, pupil, 231 
Semi-interquartile range, 273 
Skills, learning, development of, 
114 
testing of, 90 
Social acceptability, factors re- 
lated to, 143 
significance of, 143 
study of, 145 
Social adjustment, approaches to 
study of, 145 
evaluation of, 252 
and school progress, 143 
Social distance Scale, 175 
Social relationships and person- 
ality, 121 
Socio-economic status and in- 
telligence, 80 
Sociogram, 152-155 
analysis of results, 156 
individual Tepresentation, 156 
meaning of, 152 
suggestions for drawing, 154 
values of, 155 
Sociometric method, choice 
blank, 149 
nature of, 145 
number of choices, 148 
results, scoring of, 149 


Sociometric method, results, tab- 
ulation of, 149—152 
use of, 158-160 
values of, 146 
Sociometric questions, 146—148 
Sociometry, 145 
in basic test program, 253 
questions used in, 146 
suggestions for using, 147 
tabulation sheet, 151 
uses of, 155-159 
Standard deviation, 58 
as basis for units of measure- 
ment, 59-61 
nature of, 60 
in relation to standard-score 
norms, 60—63 
Standard Score, 57-63, 97 
advantages of, 63 
derivation of, 57-59 
and percentile rank, 62 
scales, 97 
uses and limitations, 63 
Standardization, 10, 60 
related to teacher-made tests, 
218 
shortcomings of, 204 
steps in, 48 
Strong Vocational Interest Blank, 
167 
Subject-matter tests, 254 
Survey testing, 91 


Teacher-made tests, 202 
set-ups for, 214 
values of, 204 
Teacher-pupil conferences, 134, 
233 
Teacher-pupil-parent confer- 
ences, 233 
Test Construction, problems in, 
48 
Suggestions for, 211 
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Test data, organization of, 262 
Test items, analysis of, 91 
Test results, use of, 113 
Test scores, ranking of, 262 
Testing program, check list for, 
255 
Tests, achievement, types of, 90 
adequacy of, 23 
administration of, rules for, 
111 
attitudes toward, 3 
committee on, 29 
economy of, 24, 91 
equivalent, 22 
and instruction, 6 
interpretation of, 112 
manuals, 25 
meaning of, 1 
performance, 7 
publishers of, 34 
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Tests, purposes of, 7 
selection of, 28 
precautions in, 40 
standardization of, 48 


types of, 6 
uses and limitations of, 2 


verbal, 7 
(See also specific tests) 


Thinking, factors in, 92 
Traits; personality, 125 
True-false questions, construc- 
tion of, 208 
defects of, 207 


Validity, 15 
Verbal test, 7 

Vocabulary age, 51 
Vocational Interest Blank, 167 


